{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Otto商品分类——LightGBM\n",
    "原始特征+tfidf特征\n",
    "boosting_type参数取值goss"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们以Kaggle 2015年举办的Otto Group Product Classification Challenge竞赛数据为例，LightGBM以进行参数调优。\n",
    "\n",
    "Otto数据集是著名电商Otto提供的一个多类商品分类问题，类别数=9. 每个样本有93维数值型特征（整数，表示某种事件发生的次数，已经进行过脱敏处理）。 竞赛官网：https://www.kaggle.com/c/otto-group-product-classification-challenge/data\n",
    "\n",
    "\n",
    "第一名：https://www.kaggle.com/c/otto-group-product-classification-challenge/discussion/14335\n",
    "第二名：http://blog.kaggle.com/2015/06/09/otto-product-classification-winners-interview-2nd-place-alexander-guschin/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 首先 import 必要的模块\n",
    "import pandas as pd \n",
    "import numpy as np\n",
    "\n",
    "import lightgbm as lgbm\n",
    "from lightgbm.sklearn import LGBMClassifier\n",
    "\n",
    "from sklearn.model_selection import GridSearchCV\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 读取数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "# 读取数据\n",
    "# 这里使用原始特征+tf_idf特征，log(x+1)特征为原始特的单调变换，加上log特征对决策树模型影响不大\n",
    "# path to where the data lies\n",
    "dpath = './data/'\n",
    "\n",
    "train1 = pd.read_csv(dpath +\"Otto_FE_train_org.csv\")\n",
    "train2 = pd.read_csv(dpath +\"Otto_FE_train_tfidf.csv\")\n",
    "\n",
    "#去掉多余的id\n",
    "train2 = train2.drop([\"id\",\"target\"], axis=1)\n",
    "train =  pd.concat([train1, train2], axis = 1, ignore_index=False)\n",
    "train.head()\n",
    "\n",
    "del train1\n",
    "del train2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 准备数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 将类别字符串变成数字，LightGBM不支持字符串格式的特征输入/标签输入\n",
    "y_train = train['target'] #形式为Class_x\n",
    "y_train = y_train.map(lambda s: s[6:])\n",
    "y_train = y_train.map(lambda s: int(s) - 1)#将类别的形式由Class_x变为0-8之间的整数\n",
    "\n",
    "X_train = train.drop([\"id\", \"target\"], axis=1)\n",
    "\n",
    "#保存特征名字以备后用（可视化）\n",
    "feat_names = X_train.columns \n",
    "\n",
    "#sklearn的学习器大多之一稀疏数据输入，模型训练会快很多\n",
    "#查看一个学习器是否支持稀疏数据，可以看fit函数是否支持: X: {array-like, sparse matrix}.\n",
    "#可自行用timeit比较稠密数据和稀疏数据的训练时间\n",
    "from scipy.sparse import csr_matrix\n",
    "X_train = csr_matrix(X_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## LightGBM超参数调优"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "LightGBM的主要的超参包括：\n",
    "1. 树的数目n_estimators 和 学习率 learning_rate\n",
    "2. 树的最大深度max_depth 和 树的最大叶子节点数目num_leaves（注意：XGBoost只有max_depth，LightGBM采用叶子优先的方式生成树，num_leaves很重要，设置成比 2^max_depth 小）\n",
    "3. 叶子结点的最小样本数:min_data_in_leaf(min_data, min_child_samples)\n",
    "4. 每棵树的列采样比例：feature_fraction/colsample_bytree\n",
    "5. 每棵树的行采样比例：bagging_fraction （需同时设置bagging_freq=1）/subsample\n",
    "6. 正则化参数lambda_l1(reg_alpha), lambda_l2(reg_lambda)\n",
    "\n",
    "7. 两个非模型复杂度参数，但会影响模型速度和精度。可根据特征取值范围和样本数目修改这两个参数\n",
    "1）特征的最大bin数目max_bin：默认255；\n",
    "2）用来建立直方图的样本数目subsample_for_bin：默认200000。\n",
    "\n",
    "对n_estimators，用LightGBM内嵌的cv函数调优，因为同XGBoost一样，LightGBM学习的过程内嵌了cv，速度极快。\n",
    "其他参数用GridSearchCV"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "MAX_ROUNDS = 10000"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 相同的交叉验证分组"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# prepare cross validation\n",
    "from sklearn.model_selection import StratifiedKFold\n",
    "\n",
    "kfold = StratifiedKFold(n_splits=3, shuffle=True, random_state=3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1. n_estimators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#直接调用lightgbm内嵌的交叉验证(cv)，可对连续的n_estimators参数进行快速交叉验证\n",
    "#而GridSearchCV只能对有限个参数进行交叉验证，且速度相对较慢\n",
    "def get_n_estimators(params , X_train , y_train , early_stopping_rounds=10):\n",
    "    lgbm_params = params.copy()\n",
    "    lgbm_params['num_class'] = 9\n",
    "     \n",
    "    lgbmtrain = lgbm.Dataset(X_train , y_train )\n",
    "     \n",
    "    #num_boost_round为弱分类器数目，下面的代码参数里因为已经设置了early_stopping_rounds\n",
    "    #即性能未提升的次数超过过早停止设置的数值，则停止训练\n",
    "    cv_result = lgbm.cv(lgbm_params , lgbmtrain , num_boost_round=MAX_ROUNDS , nfold=3,  metrics='multi_logloss' , early_stopping_rounds=early_stopping_rounds,seed=3 )\n",
    "     \n",
    "    print('best n_estimators:' , len(cv_result['multi_logloss-mean']))\n",
    "    print('best cv score:' , cv_result['multi_logloss-mean'][-1])\n",
    "     \n",
    "    return len(cv_result['multi_logloss-mean'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "best n_estimators: 368\n",
      "best cv score: 0.48879701149292193\n"
     ]
    }
   ],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.1,\n",
    "          'num_leaves': 60,\n",
    "          'max_depth': 6,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          'colsample_bytree': 0.7,\n",
    "         }\n",
    "\n",
    "n_estimators_1 = get_n_estimators(params , X_train , y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2. num_leaves & max_depth=7\n",
    "num_leaves建议70-80，搜索区间50-80,值越大模型越复杂，越容易过拟合\n",
    "相应的扩大max_depth=7"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 4 candidates, totalling 12 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.\n",
      "[Parallel(n_jobs=4)]: Done   8 out of  12 | elapsed: 20.3min remaining: 10.2min\n",
      "[Parallel(n_jobs=4)]: Done  12 out of  12 | elapsed: 31.0min finished\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=StratifiedKFold(n_splits=3, random_state=3, shuffle=True),\n",
       "             error_score='raise-deprecating',\n",
       "             estimator=LGBMClassifier(boosting_type='goss', class_weight=None,\n",
       "                                      colsample_bytree=0.7,\n",
       "                                      importance_type='split',\n",
       "                                      learning_rate=0.1, max_bin=127,\n",
       "                                      max_depth=7, min_child_samples=20,\n",
       "                                      min_child_weight=0.001,\n",
       "                                      min_split_gain=0.0, n_estimators=368,\n",
       "                                      n_jobs=4, num_class=9, num_leaves=31,\n",
       "                                      objective='multiclass', random_state=None,\n",
       "                                      reg_alpha=0.0, reg_lambda=0.0,\n",
       "                                      silent=False, subsample=1.0,\n",
       "                                      subsample_for_bin=200000,\n",
       "                                      subsample_freq=0),\n",
       "             iid='warn', n_jobs=4, param_grid={'num_leaves': range(50, 90, 10)},\n",
       "             pre_dispatch='2*n_jobs', refit=False, return_train_score='warn',\n",
       "             scoring='neg_log_loss', verbose=5)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.1,\n",
    "          'n_estimators':n_estimators_1,\n",
    "          'max_depth': 7,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          'colsample_bytree': 0.7,\n",
    "         }\n",
    "lg = LGBMClassifier(silent=False,  **params)\n",
    "\n",
    "num_leaves_s = range(50,90,10) #50,60,70,80\n",
    "tuned_parameters = dict( num_leaves = num_leaves_s)\n",
    "\n",
    "grid_search = GridSearchCV(lg, n_jobs=4, param_grid=tuned_parameters, cv = kfold, scoring=\"neg_log_loss\", verbose=5, refit = False,return_train_score='warn')\n",
    "grid_search.fit(X_train , y_train)\n",
    "#grid_search.best_estimator_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.49037893150957024\n",
      "{'num_leaves': 50}\n"
     ]
    }
   ],
   "source": [
    "# examine the best model\n",
    "print(-grid_search.best_score_)\n",
    "print(grid_search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZUAAAEHCAYAAABm9dtzAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nO3deXhV5bn///ediZAAYQjzFEYVUVFSxAkZxKqtohatY52q1uOMdrjO+Z7++mtPv22pDE5VcUDbU+faFu2giIAoggYUFRkS5jAmzIQhJLm/f+wVCBhIDHtn7Z18Xte1r2SvvfZa98Mi+eR51trPMndHREQkGpLCLkBERBoOhYqIiESNQkVERKJGoSIiIlGjUBERkahJCbuAMGVnZ3tOTk7YZYiIJJR58+YVu3vb6l5r1KGSk5NDXl5e2GWIiCQUM1t1pNc0/CUiIlGjUBERkahRqIiISNQoVEREJGoUKiIiEjUKFRERiRqFioiIRI1CRRqM2QXFrNpcEnYZIo1ao/7wozQcC9dt55pn5pKcZFyZ24W7h/ehU8umYZcl0uiopyINwoSp+bRIT+GaQd14fV4hQx+awS/f/IriXfvCLk2kUVGoSML7bM023l20kduG9ORXl/Zn+oNDuXRAJ56fvYIhY6fz+7cXs333/rDLFGkUFCqS8MZPXUqrjFRuPKsHAF1aZTB29Cm8O+ZcRpzQnsenL+Psse/x6LR8du0rC7lakYZNoSIJ7ZOVW3h/aRF3DO1FsyaHniLs2bYZj159Kv+85xxO79GacVOXcu7Y6Twzazl795eHVLFIwxbTUDGzC8xsiZkVmNnPjrLeaDNzM8sNnqeZ2WQz+8LMFpjZ0GB5hpn9w8wWm9lCM/ttlW3caGZFZvZZ8PhhLNsm8WHcO0to27wJ1w/OOeI6/Tq14JkbvsUb/3Emx3dszv/8YxFDfz+DP89dxf7yivorVqQRiFmomFky8DhwIdAPuNrM+lWzXnPgHmBulcW3Arj7ScBIYJyZVdb6kLsfD5wKnGVmF1Z53yvuPiB4PBP1RklcmV1QzJzlW7hzaC+apiXXuP5p3Vrx5x8O5sVbT6dTy3T+669fMmLcTN6YX0h5hddDxSINXyx7KoOAAndf7u6lwMvAqGrW+xUwFthbZVk/YBqAu28CtgG57r7b3acHy0uB+UCX2DVB4pW789A7S+iYlc5Vg7p9o/ee2Subv9xxJs/dmEuzJimMeXUBF0x8n399sR53hYvIsYhlqHQG1lR5XhgsO8DMTgW6uvtbh713ATDKzFLMrAcwEOh62HtbAhcThE/ge2b2uZm9bmaHrF/lfbeZWZ6Z5RUVFdWpYRK+GUuLmL96G3cN7016as29lMOZGcOPb89bd5/N49ecRoU7d/x5Phc/9gHTl2xSuIjUUSxDxapZduAnNRjOmgA8UM16zxEJoTxgIjAbKKvy3hTgJeARd18eLH4TyHH3k4F3gReqK8rdJ7l7rrvntm1b7d0wJc65O+PfWUrX1k25YmC1fzvUWlKS8Z2TO/L2fUN46IpT2LZ7PzdN/oQrn/qIOcs3R6likcYjlqFSyKG9iy7AuirPmwP9gRlmthIYDEwxs1x3L3P3+4NzI6OAlkB+lfdOAvLdfWLlAnff7O6Vn3R7mkjvRhqgd77ayBdrt3PP8D6kpUTnv3BKchKjB3bhvQeG8qtL+7N6y26umjSH65+dy4I126KyD5HGIJah8gnQx8x6mFkacBUwpfJFd9/u7tnunuPuOcAc4BJ3zwuu8soEMLORQJm7fxU8/x8gC7iv6s7MrGOVp5cAi2LYNglJRYUzYepSemZnctmpnWt+wzeUlpLE9YO7M/PHw/ivi07gy7XbGfX4h9z6xzwWb9gR9f2JNDQxCxV3LwPuAt4m8gv+VXdfaGa/NLNLanh7O2C+mS0CfgpcD2BmXYD/InIif/5hlw7fE1xmvIDI1WQ3Rr1RErp/fLGexRt2cu95fUhJjt3fROmpydw6pCezfjqcMSP7MmfZZi58eBb3vPQpK4o1aaXIkVhjPiGZm5vreXl5YZchtVRe4Zw/YSbJSca/7x1CUlJ1p+1iY9vuUp56fznPf7iS0vIKrhjYhbtH9KGzJq2URsjM5rl7bnWv6RP1kjD+/tlalhWVMGZk33oNFICWGWn89ILjmfmToVw/uDtvzF/LsN/P4BdTFlK0U5NWilRST0U9lYSwv7yCEeNm0jw9hbfuPhuz+g2Vw63dtodHp+Xz2rxC0pKTuPGsHG4f0pOWGWmh1iVSH9RTkYT3l3mFrN6ymwfO7xt6oAB0btmU337vZN4dcy7nn9ieJ2cu45zfTecRTVopjZxCReLevrJyHpmWz4CuLRl2XLuwyzlEj+xMHr7qVP517zmc0asN46cuZcjY6Tz9viatlMZJoSJx75VP1rBu+9646aVU5/gOLZj0g1z+dudZnNipBb/+5yLO/f10/jRnFaVlmrRSGg+FisS1vfvLeey9Agb1aM3ZvbPDLqdGA7q25E+3nM7Ltw2ma6sM/vtvXzJi/Axen6dJK6VxUKhIXPvfOavYtHMfD4yM315KdQb3bMNrPzqDyTd9i6ymqTz42gLOnzCTf3y+ngqFizRgChWJWyX7ynhixjLO6ZPN6T3bhF3ON2ZmDDuuHW/edTZPXHsaZsadL87nu49+wHuLN2rSSmmQFCoSt56fvZLNJaWMGdk37FKOiZlx4UmRSSvHX3kKu/aVcfPzeYx+8iNmLysOuzyRqFKoSFzasXc/k95fzvDj23Fqt1ZhlxMVyUnG5ad1YdoD5/Lry/qzdusernl6Ltc+M4dPV28NuzyRqFCoSFx67oMVbN+zP+F7KdVJTU7i2tO7M+PHQ/k/3zmBxet3ctkfZvPDFz5h0XpNWimJTaEicWfb7lKenbWCC07sQP/OWWGXEzPpqcn88JyevP+TYTx4fl/mrtjChQ/P4q4X57OsaFfY5YnUiUJF4s6k95ezq7SM+xtgL6U6mU1SuGt4Hz74yXDuHNaL9xZvYuT4mfz4tQUUbt0ddnki34hCReJK8a59TP5wJRef3InjOjQPu5x6lZWRyo+/fTzv/2QYN57Zg78vWMewh2bw//39Szbt2Bt2eSK1olCRuPLkjGXsKyvn3vP6hF1KaLKbNeHnF/djxoNDGT2wK3+eu5ohv5/Ob/61iK0lpWGXJ3JUChWJGxt37OVPc1Zx+Wld6NW2WdjlhK5Ty6b85vKTmPbAuVzYvyOT3l/OkLHTmfjuUnbu3R92eSLVUqhI3Hh8egHlFc69IxpvL6U63dtkMuH7A3j7viGc1Tubie/mM2TsdJ6auYw9pZq0UuKLQkXiQuHW3bz08WquyO1K19YZYZcTl/q2b86T1w/kzbvO5uQuLfnNvxYz5PfT+eNHKzVppcQNhYrEhcfeK8Aw7h7eO+xS4t5JXbJ44eZBvHr7GfRok8nP/76QYQ/N4NW8NZSVK1wkXAoVCd2qzSW8Nq+Qa07vRifd873WBvVozSu3D+aFmwfRplkaP3n9c86f+D5vLlinSSslNAoVCd3D0/JJTTb+Y2ivsEtJOGbGuX3b8vc7z+LJ6waSkmTc/dKnfOfRD3j3K01aKfVPoSKhKti0i799upYfnJFDuxbpYZeTsMyMC/p34F/3DmHi9wewu7SMH/4xj8ufmM2HBZq0UuqPQkVCNfHdpaSnJnP7kJ5hl9IgJCcZl57amXfHnMtvLj+JDdv3cu0zc7l60hzmrdKklRJ7ChUJzaL1O3jr8/XcfFYP2jRrEnY5DUpqchJXD+rG9AeH8vPv9iN/006+98Rsbn7+Exau2x52edKAKVQkNBOmLqV5egq3nqNeSqykpyZz89k9mPnjYfz428eRt3IL33nkA+58cT4FmzRppUSfQkVC8UXhdt75aiO3ntOTrIzUsMtp8DKbpHDnsN7M+ulw7h7emxmLN3H+hJk8+NoC1mzRpJUSPQoVCcW4qUtomZHKTWflhF1Ko5LVNJUHzj+O938yjJvP6sGUBesYPm4G//23L9moSSslChQqUu/mrdrKjCVF3D6kF83T1UsJQ5tmTfg/3+3H+z8expW5XXnp49UMGTud//vPRWzRpJVyDBQqUu/GT11CdrM0bjize9ilNHodstL59WUn8d4DQ/nOyR15ZlZk0srxU5eyQ5NWSh0oVKRefbRsMx8WbOaOob3JSEsJuxwJdGuTwfgrI5NWDumbzSPT8jnnd9N5YsYydpeWhV2eJBCFitQbd2f81CV0aJHOtad3C7scqUaf9s35w7UDeevuszmtW0t+9+/FDBk7g+c/XMG+Ms2ILDVTqEi9eT+/mE9WbuXO4b1JT00Ouxw5iv6ds5h80yBe/9EZ9GqbyS/e/IrhD83klU9Wa9JKOSqFitQLd2f8O0vo3LIp38/tGnY5Uku5Oa15+bbB/OmWQWQ3b8JP//IFIye8z98/W6tJK6VaChWpF9MWbWJB4XbuHdGHtBT9t0skZsY5fdryt/84k6d/kEuTlCTuffkzLnpkFu8s3KBJK+UQ+umWmKuocMZNXUpOmwwuP61z2OVIHZkZI/u155/3nMPDVw1gX1kFt/1pHpf+YTaz8osULgIoVKQe/HvhBhat38F95/UlJVn/5RJdUpIxakBnpt4/hN997ySKd+7j+mc/5qpJc8hbuSXs8iRkMf0JN7MLzGyJmRWY2c+Ost5oM3Mzyw2ep5nZZDP7wswWmNnQYHmGmf3DzBab2UIz+22VbTQxs1eCfc01s5xYtk1qp7zCGT91KX3aNePiUzqFXY5EUUpyEt//Vjfee/BcfnFxP5YVlTD6yY+4afLHfLlWk1Y2VjELFTNLBh4HLgT6AVebWb9q1msO3APMrbL4VgB3PwkYCYwzs8paH3L344FTgbPM7MJg+S3AVnfvDUwAfhf9Vsk39eaCdRRs2sV95/UlOcnCLkdioElKMjee1YP3fzKUn15wPPNXb+O7j37Af/x5HgWbdoZdntSzWH76bBBQ4O7LAczsZWAU8NVh6/0KGAs8WGVZP2AagLtvMrNtQK67fwxMD5aXmtl8oEvwnlHAL4LvXwceMzNzDfSGpqy8gonvLuWEji24sH+HsMuRGMtIS+GOob24dnA3npm1gmdnLeffX25gxAntOaFjC3pmZ9KzbSY9sjM1PU8DFstQ6QysqfK8EDi96gpmdirQ1d3fMrOqobIAGBUEUVdgYPD14yrvbQlcDDx8+P7cvczMtgNtgENue2dmtwG3AXTrpg/gxdIb89eycvNunv5BLknqpTQaLdJTGTOyLzeemcOTM5fxry/XM23RRqpegZzdrAk922bSMzsSMj3bNqNHdibdWmfo6sAEF8tQqe63yIH/VsFw1gTgxmrWew44AcgDVgGzgbIq700BXgIeqewJ1bS/AwvcJwGTAHJzc9WLiZHSsgoenpbPKV2yOO+EdmGXIyFonZnGf150Av950QnsKytn9ebdLC8uYXlRCSuKd7GiuISpX21kc5UJLJMMurbOCMKmGT3aZtIrO5MebTPp0CIdM/1xEu9iGSqFRHoXlboA66o8bw70B2YE/1E6AFPM7BJ3zwPur1zRzGYD+VXeOwnId/eJ1eyvMAidLECXooTk1bw1rN22h/97+Un6RSA0SUmmT/vm9Gnf/Guvbd+9nxWbI0GzvKiE5cUlrCgqYc7yLezZf3BqmKapyfQIAubgUFqkh5PVVMNp8SKWofIJ0MfMegBrgauAaypfdPftQHblczObATzo7nlmlgGYu5eY2UigzN2/Ctb7HyKB8cPD9jcFuAH4CBgNvKfzKeHYu7+cx94rILd7K4b0ya75DdKoZWWkMiCjJQO6tjxkeUWFs3HnXlYUlbAsCJoVxbtYuHY7//5yA+VVxtOym6VFAifo4VQOrXVrk0GTFE0JVJ9iFirBeY27gLeBZOA5d19oZr8E8tx9ylHe3g5428wqiATS9QBm1gX4L2AxMD/4C/gxd38GeBb4k5kVEOmhXBWjpkkNXpy7mg079jLh+wPUS5E6S0oyOmY1pWNWU87sfegfJ6VlFazespsVxSUsL4oMpS0vLmH6kiJezSs8uA2DLq0yDgROr7YHh9U6tkjXub4YsMb8x3xubq7n5eWFXUaDsru0jCFjp9O3fXNevHVw2OVII7Rj735WFpeworiEZUWRryuKd7GiqISS0oPDaempSeS0OXhFWs8gbHpmZ9IyIy3EFsQ/M5vn7rnVvaYbWkhU/fGjVRTvKuWp6/uGXYo0Ui3SUzm5S0tO7nLocJq7s2nnvuBCgYM9nMXrd/LOwo2UVRlOa515cDjt4FVqzejeJkMzbNdAoSJRs3Pvfp6auYyhx7VlYPfWYZcjcggzo32LdNq3SOeMXm0OeW1/eQVrDgynBRcLFO9iVn4Rr88rrLIN6NyyadCzOXgpdI/sTDq3bKrhNBQqEkWTP1zJ1t37GTNSvRRJLKnJSfRs24yebZsx4oRDX9u1r4yVwTmbyt7NiuIS/jJ/Lbv2HbwrZlpKEj3aHOzd9KhyhVrrzMYznKZQkajYvns/T89azvn92n9t2EEkkTVrkkL/zln075x1yHJ3p2jXvuCqtJIDn8HJ37STaYs3sr/84HBay4zUKhcLNKtypVpmgxtOU6hIVDw9azk795Zxv3op0kiYGe2ap9OueTqn9zx0OK2svILCrXuqhE2kh/PRss28MX/tIeseGE5rezBoemY3o3Orpgk5X55CRY7ZlpJSJn+4gu+c3JETOrYIuxyR0KUkJ5GTnUlOdibDDnutZF8ZKzeXHDh/Uxk8f/10LTv3VhlOS06ie5uMA9PY9Kzywc/WmWlxe7m+QkWO2VMzl7Fnfzn3n9cn7FJE4l5mkxRO7JTFiZ2+Ppy2uaQ0cs6mqIRlwWXQK4pLmLGkiNLyigPrtkhPoUfbZpEpbIKwqezlZKSF+2tdoSLHZNPOvbzw0UouHdCZ3u2+PgWHiNSOmZHdrAnZzZrwrZxDr54sr3DWbt3D8mDOtMoeztwVW3jj00OH0zpmpR9ykUDllDadWzatl5vkKVTkmPxh+jL2lzv3qpciEjPJSUa3Nhl0a5PB0OMOfW1PafmB4bTIBz4jwfPmgvVs37P/wHqpyUa31hkHhtIuOqkjp3SN/kU1ChWps3Xb9vDi3NVcMbAL3dtkhl2OSKPUNC2ZEzq2qPZ85paS0q9N1LmiuISZS4vo3a6ZQkXiy2PTC3Ccu4b3DrsUEalG68w0Wme2/tqHkcsr/JAJOaNJoSJ1smbLbl79ZA3XnN6NLq0ywi5HRL6B5CSL2eXKusWa1MnD0/JJTjLuHKZeiogcpFCRb2x50S7emF/I9YO7075FetjliEgcUajINzbx3XzSU5P50dBeYZciInFGoSLfyJINO3nz83XccGYO2c2ahF2OiMQZhYp8IxPfXUqztBRuH9Iz7FJEJA4pVKTWvly7nX99uYGbz+6hO+OJSLUUKlJrE6YuJatpKrec0yPsUkQkTilUpFY+Xb2VaYs3cduQnrRITw27HBGJUzWGipn1MrMmwfdDzeweM9NdmBqZ8VOX0iYzjRvPzAm7FBGJY7XpqfwFKDez3sCzQA/gxZhWJXFl7vLNzMov5o6hvchsokkYROTIahMqFe5eBlwGTHT3+4GOsS1L4oW7M27qUto1b8J1g7uHXY6IxLnahMp+M7sauAF4K1imQfVG4sOCzXy8Ygt3Duvd4O6lLSLRV5tQuQk4A/i1u68wsx7A/8a2LIkHkV7KEjplpXPVoK5hlyMiCaDGAXJ3/wq4B8DMWgHN3f23sS5Mwjd9ySY+Xb2N31x+Ek1S1EsRkZrV5uqvGWbWwsxaAwuAyWY2PvalSZjcnXHvLKVb6wxGD+wSdjkikiBqM/yV5e47gMuBye4+EDgvtmVJ2N5euIGF63Zw74g+pNbDfa1FpGGozW+LFDPrCFzJwRP10oBVVDgTpubTs20ml57aOexyRCSB1CZUfgm8DSxz90/MrCeQH9uyJExvfbGeJRt3cv95fWN2dzgRaZhqc6L+NeC1Ks+XA9+LZVESnrLyCiZOXcrxHZrznZP0cSQR+WZqc6K+i5n91cw2mdlGM/uLmenMbQP1t8/Wsby4hPtH9iVJvRQR+YZqM/w1GZgCdAI6A28Gy6SB2V9ewcPTlnJS5yzO79c+7HJEJAHVJlTauvtkdy8LHs8DbWNcl4TgtbxC1mzZw5iRfTFTL0VEvrnahEqxmV1nZsnB4zpgc6wLk/q1d385j76Xz2ndWjL0OP3NICJ1U5tQuZnI5cQbgPXAaCJTt9TIzC4wsyVmVmBmPzvKeqPNzM0sN3ieZmaTzewLM1tgZkOrrPtrM1tjZrsO28aNZlZkZp8Fjx/WpkaJePnj1azfvpcHzj9OvRQRqbMaQ8XdV7v7Je7e1t3bufulRD4IeVRmlgw8DlwI9AOuNrN+1azXnMg0MHOrLL412PdJwEhgnJlV1vomMOgIu33F3QcEj2dqqlEi9pSW8/iMZQzu2Zoze7UJuxwRSWB1/aj0mFqsMwgocPfl7l4KvAyMqma9XwFjgb1VlvUDpgG4+yZgG5AbPJ/j7uvrWLdU409zVlK0c596KSJyzOoaKrX5zdMZWFPleWGw7OBGzE4Furr74Z/UXwCMMrOUYFbkgUBtpsn9npl9bmavm5mm1a2FXfvKeHLmcs7pk823clqHXY6IJLi6horXYp3qgufA+4LhrAnAA9Ws9xyREMoDJgKzgbIa9vcmkOPuJwPvAi9UW5TZbWaWZ2Z5RUVFNTaioXth9kq2lJTywPnHhV2KiDQAR/xEvZntpPrwMKBpLbZdyKG9iy7AuirPmwP9gRnBkEsHYIqZXeLuecD9VWqZTQ1Tw7h71SvSngZ+d4T1JgGTAHJzc2sTjg3W9j37eWrmMs47oR0DurYMuxwRaQCOGCru3vwYt/0J0CcYvloLXAVcU2X724HsyudmNgN40N3zzCwDMHcvMbORQFlwX5cjMrOOVc61XAIsOsb6G7xnP1jBjr1l3D+yb9iliEgDUePcX3Xl7mVmdheRySiTgefcfaGZ/RLIc/cpR3l7O+BtM6sgEkjXV75gZmOJhFOGmRUCz7j7L4B7zOwSIsNkW4AbY9CsBmNrSSnPfbCCi07qwImdssIuR0QaCHNvvCNAubm5npeXF3YZofjtvxbz1PvLePu+IfRtf6ydUhFpTMxsnrvnVvea7r7UCBXt3McLs1cy6pROChQRiSqFSiP0xIxllJZXcO95OpciItFV4zmVI1wFtp3I5b4PBPdXkQSxYfte/nfuKi4/tTM9sjPDLkdEGpjanKgfT+RS4BeJXE58FZHLf5cQ+TzJ0FgVJ9H3+PQC3J17RvQJuxQRaYBqM/x1gbs/5e473X1H8DmPi9z9FaBVjOuTKCrcupuXP1nNlbld6do6I+xyRKQBqk2oVJjZlWaWFDyurPJa4710LAE9Oq0AM+Ou4b3DLkVEGqjahMq1RD4nsil4XA9cZ2ZNgbtiWJtE0criEl6fX8i1p3ejY1ZtJkQQEfnmajynEpyIv/gIL38Q3XIkVh6elk9qsnHH0F5hlyIiDViNPRUz62JmfzWzTWa20cz+YmZd6qM4iY78jTv522drueHMHNo1Tw+7HBFpwGoz/DUZmAJ0IjJ1/ZvBMkkQE9/NJyM1mduHqJciIrFVm1Bp6+6T3b0seDwP6CbmCeKrdTv4xxfrueXsHrTOTAu7HBFp4GoTKsVmdp2ZJQeP64DNNb5L4sL4qUtpkZ7CLef0DLsUEWkEahMqNwNXAhuA9cBo4KZYFiXRsWDNNt5dtJFbz+lJVtPUsMsRkUagxlBx99Xufom7t3X3du5+KXB5PdQmx2j81KW0ykjlprN7hF2KiDQSdZ1QckxUq5Coy1u5hZlLi/jRub1o1iRmt80RETlEXUOluvvPSxwZ985Ssps14Qdn5IRdiog0InUNFU3PEsdmFxTz0fLN3DmsF03TksMuR0QakSOOixxhynuI9FI0z0eccnfGTV1Kx6x0rh7ULexyRKSROWKouLtuCZiAZi4tYt6qrfz6sv6kp6qXIiL1S3d+bEDcnfFTl9KlVVOuGNg17HJEpBFSqDQgU7/ayOeF27l3RB/SUnRoRaT+6TdPA1FREeml9MzO5LJTO4ddjog0UgqVBuKfX65n8Yad3HteH1KSdVhFJBz67dMAlFc4E6YupW/7Znz35E5hlyMijZhCpQH4+2drWVZUwv3n9SU5SZ9LFZHwKFQS3P7yCh6elk+/ji349okdwi5HRBo5hUqCe2N+Ias27+aB8/uSpF6KiIRMoZLA9pWV88i0AgZ0bcnw49uFXY6IiEIlkb36yRrWbtvDA+f3xUy9FBEJn0IlQe3dX86j7xUwKKc1Z/fODrscERFAoZKw/nfOKjbt3McY9VJEJI4oVBJQyb4ynpy5jLN7ZzO4Z5uwyxEROUChkoBe+GglxbtKGXN+37BLERE5hEIlwezYu5+nZi5n+PHtOK1bq7DLERE5hEIlwTz3wQq279nPmJHqpYhI/FGoJJBtu0t5dtYKvn1ie/p3zgq7HBGRr4lpqJjZBWa2xMwKzOxnR1lvtJm5meUGz9PMbLKZfWFmC8xsaJV1f21ma8xs12HbaGJmrwT7mmtmOTFqVmienrWcXaVl3K9eiojEqZiFipklA48DFwL9gKvNrF816zUH7gHmVll8K4C7nwSMBMaZWWWtbwKDqtnlLcBWd+8NTAB+F6WmxIXNu/Yx+cOVfPfkThzfoUXY5YiIVCuWPZVBQIG7L3f3UuBlYFQ16/0KGAvsrbKsHzANwN03AduA3OD5HHdfX812RgEvBN+/DoywBvQBjidnLmPv/nLuO69P2KWIiBxRLEOlM7CmyvPCYNkBZnYq0NXd3zrsvQuAUWaWYmY9gIFATTddP7A/dy8DtgNf+xCHmd1mZnlmlldUVPRN2hOaTTv28sePVnHZqV3o1bZZ2OWIiBxRSgy3XV0vwQ+8GBnOmgDcWM16zwEnAHnAKmA2UHYs+zuwwH0SMAkgNzf3a6/Ho8enF1Be4dw7Qr0UEYlvsQyVQg7tXXQB1lV53hzoD8wIRqk6AFPM7BJ3zwPur1zRzGYD+bXcX6GZpQBZwCcUVx0AAA26SURBVJZjbUTY1m7bw0sfr+GK3K50a5MRdjkiIkcVy+GvT4A+ZtbDzNKAq4AplS+6+3Z3z3b3HHfPAeYAl7h7npllmFkmgJmNBMrc/asa9jcFuCH4fjTwnrsnRE/kaB57L5Kldw/vHXIlIiI1i1moBOc17gLeBhYBr7r7QjP7pZldUsPb2wHzzWwR8FPg+soXzGysmRUCGWZWaGa/CF56FmhjZgXAGOCIlzAnitWbd/NaXiHXnN6NTi2bhl2OiEiNrAH8MV9nubm5npeXF3YZR/TAqwt46/N1zPrJMNq1SA+7HBERAMxsnrvnVveaPlEfpwo27eKvnxbygzO6K1BEJGEoVOLUw9PySU9N5kfn9gq7FBGRWlOoxKHFG3bw5oJ13HRWDm2aNQm7HBGRWlOoxKEJU5fSPD2F285RL0VEEotCJc58Ubidtxdu5Idn9yQrIzXsckREvhGFSpwZP3UJLTNSufnsnLBLERH5xhQqcWTeqq1MX1LE7UN60TxdvRQRSTwKlTgyfuoSspulccOZ3cMuRUSkThQqcWLO8s18WLCZH53bi4y0WE7JJiISOwqVOODujH9nKe1bNOG6weqliEjiUqjEgVn5xXy8cgt3DetNempy2OWIiNSZQiVk7s64qUvp3LIpV36rpvuQiYjEN4VKyKYt2sSCNdu4Z0RvmqSolyIiiU2hEqKKCmf81KV0b5PB5ad1CbscEZFjplAJ0dsLN/DV+h3cd14fUpN1KEQk8ek3WUjKg15K73bNuOSUzmGXIyISFQqVkLz1+TryN+3i/vP6kpxkYZcjIhIVCpUQlJVXMPHdfI7v0JwL+3cIuxwRkahRqITgjU/XsqK4hDEj+5KkXoqINCAKlXpWWlbBI9PyOblLFiP7tQ+7HBGRqFKo1LNX89ZQuHUPY0b2xUy9FBFpWBQq9Wjv/nIee6+A3O6tOLdv27DLERGJOoVKPXrp49Vs2LGXMeerlyIiDZNCpZ7sKS3n8enLOKNnG87slR12OSIiMaFQqSd//Gglxbv28cD5fcMuRUQkZhQq9WDXvjKenLmMc/u2JTenddjliIjEjEKlHkz+YAVbd+9XL0VEGjyFSoxt372fSbOWM7Jfe07u0jLsckREYkqhEmPPfLCcnXvLGDNSvRQRafgUKjG0paSU5z5YwXdO7sgJHVuEXY6ISMwpVGLoqZnL2LO/nPvP6xN2KSIi9UKhEiObdu7lhY9WMmpAZ3q3ax52OSIi9UKhEiNPzFjG/nLn3hHqpYhI46FQiYH12/fw5zmrGX1aF3KyM8MuR0Sk3ihUYuCx9wpwnLtH9A67FBGReqVQibI1W3bzyidruOpb3ejSKiPsckRE6lVMQ8XMLjCzJWZWYGY/O8p6o83MzSw3eJ5mZpPN7AszW2BmQ6usOzBYXmBmj1gw3a+Z/cLM1prZZ8Hjoli27UgemZZPUpJx5zD1UkSk8YlZqJhZMvA4cCHQD7jazPpVs15z4B5gbpXFtwK4+0nASGCcmVXW+gRwG9AneFxQ5X0T3H1A8PhnlJtUo+VFu3jj07VcP7g7HbLS63v3IiKhi2VPZRBQ4O7L3b0UeBkYVc16vwLGAnurLOsHTANw903ANiDXzDoCLdz9I3d34I/ApTFswzfy8LR80pKTuGNor7BLEREJRSxDpTOwpsrzwmDZAWZ2KtDV3d867L0LgFFmlmJmPYCBQNfg/YVH2eZdZva5mT1nZq2qK8rMbjOzPDPLKyoqqlPDqrN0406mLFjHjWflkN2sSdS2KyKSSGIZKtXd2tAPvBgZzpoAPFDNes8RCYw8YCIwGyirYZtPAL2AAcB6YFx1Rbn7JHfPdffctm2jd0vfie8uJTMthdvO6Rm1bYqIJJqUGG67kEjvolIXYF2V582B/sCM4Fx7B2CKmV3i7nnA/ZUrmtlsIB/YGmzna9t0941V1n8aOLz3EzML123nn19s4J4RfWiVmVZfuxURiTux7Kl8AvQxsx5mlgZcBUypfNHdt7t7trvnuHsOMAe4xN3zzCzDzDIBzGwkUObuX7n7emCnmQ0Orvr6AfD3YL2OVfZ9GfBlDNt2iAlTl5LVNJVbzu5RX7sUEYlLMeupuHuZmd0FvA0kA8+5+0Iz+yWQ5+5TjvL2dsDbZlYBrAWur/LaHcDzQFPgX8EDYKyZDSAyHLYSuD2KzTmiT1dv5d1Fm/jxt48jq2lqfexSRCRuWeQiqsYpNzfX8/Lyjmkb1z87l4XrdjDrJ8PIbBLL0UQRkfhgZvPcPbe61/SJ+mPw8YotzMov5o5zeylQRERQqNSZuzPunSW0bd6E6wZ3D7scEZG4oFCpo9nLNjN3xRbuGtabpmnJYZcjIhIXFCp14O489M4SOmWlc9WgrjW/QUSkkVCo1MGMJUV8unobd4/oQ5MU9VJERCopVOpg574yTu3WktEDu9S8sohII6JLlurgklM6cfHJHQlmAhARkYB6KnWkQBER+TqFioiIRI1CRUREokahIiIiUaNQERGRqFGoiIhI1ChUREQkahQqIiISNY36fipmVgSsquPbs4HiKJYTJrUl/jSUdoDaEq+OpS3d3b1tdS806lA5FmaWd6Sb1CQatSX+NJR2gNoSr2LVFg1/iYhI1ChUREQkahQqdTcp7AKiSG2JPw2lHaC2xKuYtEXnVEREJGrUUxERkahRqIiISNQoVGrJzFaa2Rdm9pmZ5QXLWpvZVDPLD762CrvOmhyhHb8ws7XBss/M7KKw66wNM2tpZq+b2WIzW2RmZyTiMYEjtiXhjouZHVel3s/MbIeZ3Zdox+Uo7Ui4YwJgZveb2UIz+9LMXjKzdDPrYWZzg2PyipmlRWVfOqdSO2a2Esh19+Iqy8YCW9z9t2b2M6CVu/80rBpr4wjt+AWwy90fCquuujCzF4BZ7v5M8AORAfwnCXZM4IhtuY8EPC6VzCwZWAucDtxJAh4X+Fo7biLBjomZdQY+APq5+x4zexX4J3AR8Ia7v2xmTwIL3P2JY92feirHZhTwQvD9C8ClIdbSqJhZC2AI8CyAu5e6+zYS8JgcpS2JbgSwzN1XkYDHpYqq7UhUKUBTM0sh8gfLemA48HrwetSOiUKl9hx4x8zmmdltwbL27r4eIPjaLrTqaq+6dgDcZWafm9lz8T40EegJFAGTzexTM3vGzDJJzGNypLZA4h2Xqq4CXgq+T8TjUqlqOyDBjom7rwUeAlYTCZPtwDxgm7uXBasVAp2jsT+FSu2d5e6nARcCd5rZkLALqqPq2vEE0AsYQOQ/3bgQ66utFOA04Al3PxUoAX4Wbkl1dqS2JOJxASAYwrsEeC3sWo5FNe1IuGMSBN8ooAfQCcgk8vN/uKicC1Go1JK7rwu+bgL+CgwCNppZR4Dg66bwKqyd6trh7hvdvdzdK4CnibQt3hUChe4+N3j+OpFfzAl3TDhCWxL0uFS6EJjv7huD54l4XOCwdiToMTkPWOHuRe6+H3gDOBNoGQyHAXQB1kVjZwqVWjCzTDNrXvk9cD7wJTAFuCFY7Qbg7+FUWDtHakflD3vgMiJti2vuvgFYY2bHBYtGAF+RYMcEjtyWRDwuVVzNoUNGCXdcAoe0I0GPyWpgsJllmJlx8GdlOjA6WCdqx0RXf9WCmfUk8lc9RIYqXnT3X5tZG+BVoBuRA3eFu28JqcwaHaUdfyLSnXdgJXB75fh3PDOzAcAzQBqwnMiVOUkk0DGpdIS2PEJiHpcMYA3Q0923B8sS6mcFjtiORP1Z+f+B7wNlwKfAD4mcQ3kZaB0su87d9x3zvhQqIiISLRr+EhGRqFGoiIhI1ChUREQkahQqIiISNQoVERGJGoWKiIhEjUJFJI6Y2fNmNrrmNUXik0JFRESiRqEiUgMzywlunPV0cKOjd8ysqZnNMLPcYJ3s4F41mNmNZvY3M3vTzFaY2V1mNiaYgXiOmbWu5X4HmtnMYEbpt6vMnXWrmX1iZgvM7C/B9BtZFrkBW1KwToaZrTGzVDPrZWb/DrYzy8yOD9a5Irhp0wIzez8m/3jS6ChURGqnD/C4u58IbAO+V8P6/YFriEw4+GtgdzAD8UfAD2ramZmlAo8Co919IPBcsB2I3FjpW+5+CrAIuCWYRmQBcG6wzsXA28EEgpOAu4PtPAj8IVjn58C3g+1cUlNNIrWRUvMqIkJkltfPgu/nATk1rD/d3XcCO81sO/BmsPwL4ORa7O84IsE0NTIHIMlEploH6G9m/wO0BJoBbwfLXyEyv9N0IvcA+YOZNSMyI+1rwXYAmgRfPwSeD+4E+EYtahKpkUJFpHaqTrRXDjQlMjlfZW8//SjrV1R5XkHtfu4MWOjuZ1Tz2vPApe6+wMxuBIYGy6cAvwmG1wYC7xG5d8Y2dx9w+Ebc/UdmdjrwHeAzMxvg7ptrUZvIEWn4S6TuVhL55Q0HpxCPliVAWzM7AyLDYWZ2YvBac2B9MER2beUb3H0X8DHwMPBWcN+PHcAKM7si2I6Z2SnB973cfa67/xwoBrpGuQ3SCClUROruIeAOM5sNZEdzw+5eSiSofmdmC4DPiAxjAfw3MBeYCiw+7K2vANcFXytdC9wSbGchkbsAAvzezL4wsy+B94mckxE5Jpr6XkREokY9FRERiRqdqBcJgZk9Dpx12OKH3X1yGPWIRIuGv0REJGo0/CUiIlGjUBERkahRqIiISNQoVEREJGr+H0cdX0HkLL/AAAAAAElFTkSuQmCC\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot CV误差曲线\n",
    "test_means = grid_search.cv_results_[ 'mean_test_score' ]\n",
    "test_stds = grid_search.cv_results_[ 'std_test_score' ]\n",
    "train_means = grid_search.cv_results_[ 'mean_train_score' ]\n",
    "train_stds = grid_search.cv_results_[ 'std_train_score' ]\n",
    "\n",
    "n_leafs = len(num_leaves_s)\n",
    "\n",
    "x_axis = num_leaves_s\n",
    "plt.plot(x_axis, -test_means)\n",
    "#plt.errorbar(x_axis, -test_means, yerr=test_stds,label = ' Test')\n",
    "#plt.errorbar(x_axis, -train_means, yerr=train_stds,label = ' Train')\n",
    "plt.xlabel( 'num_leaves' )\n",
    "plt.ylabel( 'Log Loss' )\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-0.49037893, -0.49258863, -0.49199925, -0.49182955])"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 性能抖动，取系统推荐值：70, 不必再细调"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 3. min_child_samples\n",
    "叶子节点的最小样本数目\n",
    "\n",
    "叶子节点数目：70，共9类，平均每类8个叶子节点\n",
    "每棵树的样本数目数目最少的类（稀有事件）的样本数目：200 * 2/3 * 0.7 = 100\n",
    "所以每个叶子节点约100/8 = 12个样本点\n",
    "\n",
    "搜索范围：10-50"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 4 candidates, totalling 12 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.\n",
      "[Parallel(n_jobs=4)]: Done   8 out of  12 | elapsed: 22.6min remaining: 11.3min\n",
      "[Parallel(n_jobs=4)]: Done  12 out of  12 | elapsed: 32.1min finished\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=StratifiedKFold(n_splits=3, random_state=3, shuffle=True),\n",
       "             error_score='raise-deprecating',\n",
       "             estimator=LGBMClassifier(boosting_type='goss', class_weight=None,\n",
       "                                      colsample_bytree=0.7,\n",
       "                                      importance_type='split',\n",
       "                                      learning_rate=0.1, max_bin=127,\n",
       "                                      max_depth=7, min_child_samples=20,\n",
       "                                      min_child_weight=0.001,\n",
       "                                      min_split_gain=0.0, n_estimators=368,\n",
       "                                      n_jobs=4, num_class=9, num_leaves=70,\n",
       "                                      objective='multiclass', random_state=None,\n",
       "                                      reg_alpha=0.0, reg_lambda=0.0,\n",
       "                                      silent=False, subsample=1.0,\n",
       "                                      subsample_for_bin=200000,\n",
       "                                      subsample_freq=0),\n",
       "             iid='warn', n_jobs=4,\n",
       "             param_grid={'min_child_samples': range(10, 50, 10)},\n",
       "             pre_dispatch='2*n_jobs', refit=False, return_train_score='warn',\n",
       "             scoring='neg_log_loss', verbose=5)"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.1,\n",
    "          'n_estimators':n_estimators_1,\n",
    "          'max_depth': 7,\n",
    "          'num_leaves':70,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          'colsample_bytree': 0.7,\n",
    "         }\n",
    "lg = LGBMClassifier(silent=False,  **params)\n",
    "\n",
    "min_child_samples_s = range(10,50,10) \n",
    "tuned_parameters = dict( min_child_samples = min_child_samples_s)\n",
    "\n",
    "grid_search = GridSearchCV(lg, n_jobs=4,  param_grid=tuned_parameters, cv = kfold, scoring=\"neg_log_loss\", verbose=5, refit = False,return_train_score='warn')\n",
    "grid_search.fit(X_train , y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.4889818006530074\n",
      "{'min_child_samples': 40}\n"
     ]
    }
   ],
   "source": [
    "# examine the best model\n",
    "print(-grid_search.best_score_)\n",
    "print(grid_search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYAAAAD4CAYAAADlwTGnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nO3deXhV5bn38e+dCQhDAiRhSAIJGuYxRFSwdSotthasdWDQI56+bY9TrbWDnrbntOrpYGu1tdr39bSCreDUqqW0aluttWWSEGYQQUggYQpDwjyE3O8fe4cGDCSEhLWH3+e6uGStvXb2/bhCflnP2s++zd0REZH4kxB0ASIiEgwFgIhInFIAiIjEKQWAiEicUgCIiMSppKALOBMZGRmel5cXdBkiIlFl0aJFO9w98+T9URUAeXl5FBcXB12GiEhUMbOyhvZrCkhEJE4pAERE4pQCQEQkTikARETilAJARCROKQBEROKUAkBEJE7FRQC8uHATb67eFnQZIiIRJeYD4OixWn49v5SvvLiUTbsOBF2OiEjEiPkASE5M4MnJI6l1546ZJRyuORZ0SSIiEaFJAWBm48xsjZmtM7P7TnPcdWbmZlYU3k4xs2lmttzMlprZZfWOnRTev8zMXjezjLMezSn06prKj68fxrLyah6avbq1XkZEJKo0GgBmlgg8AVwFDAQmmdnABo7rCHwJWFBv9+cB3H0IMBZ4xMwSzCwJ+ClwubsPBZYBd57lWE7rE4O68/mP5POb+WXMWrq5NV9KRCQqNOUKYBSwzt3Xu/sR4HlgQgPHPQg8DByqt28g8CaAu28HqoAiwMJ/2puZAZ2AVv+p/PVx/Snq3Zn7freMddv3tfbLiYhEtKYEQDawqd52eXjfcWY2Ash199knPXcpMMHMkswsHxgZPu4ocBuwnNAP/oHArxp6cTP7gpkVm1lxZWVlU8Z0SsmJCfx8ciHtkhO5fcYiDhypOauvJyISzZoSANbAPj/+oFkC8ChwbwPHPU0oMIqBx4C5QI2ZJRMKgBFAT0JTQPc39OLu/pS7F7l7UWbmhz7O+ox1T2vLTyeOYO32fXzrlRW4e+NPEhGJQU0JgHIgt952DidO13QEBgNvm1kpcBEwy8yK3L3G3e9x9+HuPgFIB9YCwwHc/QMP/QR+ERh91qNpoksKMrj7ygJeXlzB8ws3Nf4EEZEY1JQAWAgUmFm+maUAE4FZdQ+6e7W7Z7h7nrvnAfOB8e5ebGapZtYewMzGAjXuvgqoAAaaWd2v9GOBc/r2nLuuKOAjBRn896yVrKioPpcvLSISERoNAHevIfQOnTcI/ZB+0d1XmtkDZja+kadnASVmthr4BnBz+GtuBr4LvGNmywhdEXyv+cM4c4kJxmM3DqdLagp3zCxhz6Gj5/LlRUQCZ9E0B15UVOQt3RKyuHQXE5+az5UDsvi/N40k9KYkEZHYYWaL3L3o5P0xvxK4MUV5Xbjvqv68sXIbv/rnhqDLERE5Z+I+AAA+d0k+nxjUjR+89h6LynYFXY6IyDmhAADMjIevG0bP9HbcMWMxO/cdDrokEZFWpwAIS2uXzJNTCtl14AhffmEJx2qj596IiEhzKADqGZydxnc+PYh/rN3Bz99aF3Q5IiKtSgFwkkmjcrl2RDaPvfk+/1y7I+hyRERajQLgJGbGQ58ZTEFWB+5+fjFbqw81/iQRkSikAGhAakoST04p5ODRY9z1XAlHj9UGXZKISItTAJzC+Vkd+f61Q1hYupsfvbEm6HJERFqcAuA0JgzP5qaLevHUO+v588qtQZcjItKiFACN+PbVAxmSnca9Ly1l4041lReR2KEAaESbpESenFKIAbfPXMSho2oqLyKxQQHQBLldUvnJDcNZUbGHB2evCrocEZEWoQBooo8N7MYXL+3DjAUbeXVxRdDliIicNQXAGfjax/sxKq8L97+8nLXb9gZdjojIWVEAnIGkxAQenzyC9m0SuW1GCfsPq6m8iEQvBcAZ6tYp1FT+g8p9fPOV5WoqLyJRSwHQDGPOz+ArH+vLq0s2M/PdjUGXIyLSLAqAZrrj8vO5tG8m3521iuXlaiovItFHAdBMCQnGozcOp2uHFG6fuYjqA2oqLyLRRQFwFrq0T+HnkwvZUnWIr/52qe4HiEhUUQCcpZG9O3P/Jwfwl1Xb+N9/rA+6HBGRJlMAtIB/H5PHVYO788PX17CwVE3lRSQ6KABagJnxw+uGktu5HXfOLGGHmsqLSBRQALSQTm2TeXLKSKoOHOXLz6upvIhEPgVACxrYsxMPTBjEP9ft4Kdvrg26HBGR01IAtLAbinL5bGEOj7+1lnferwy6HBGRU1IAtDAz46FrBtM3qyNffmEJW6oPBl2SiEiDFACtoF1KIk/eVMjho8e4c+ZiNZUXkYikAGgl52V24IfXDWVR2W5++Np7QZcjIvIhCoBWdPXQntxycW9++c8NvL5iS9DliIicQAHQyv7zUwMYlpPG115aRumO/UGXIyJynAKglbVJSuSJKYUkJBi3zyhRU3kRiRgKgHMgp3Mqj944jFVb9vDdP6wMuhwREaCJAWBm48xsjZmtM7P7TnPcdWbmZlYU3k4xs2lmttzMlprZZfWOTTGzp8zsfTN7z8w+e9ajiWBX9O/G7Zedx3PvbuLlkvKgyxERIamxA8wsEXgCGAuUAwvNbJa7rzrpuI7Al4AF9XZ/HsDdh5hZFvCamV3g7rXAN4Ht7t7XzBKALi0yogj2lbF9WVS2m2++soJBPdPo171j0CWJSBxryhXAKGCdu6939yPA88CEBo57EHgYOFRv30DgTQB33w5UAUXhx/4d+H74sVp339GsEUSRpMQEHp80gvZtkrhtxiL2qam8iASoKQGQDWyqt10e3necmY0Act199knPXQpMMLMkM8sHRgK5ZpYefvxBMysxs5fMrFtDL25mXzCzYjMrrqyM/o9WyOrUlscnjaB0x37uf1lN5UUkOE0JAGtg3/GfWuHpm0eBexs47mlCgVEMPAbMBWoITT3lAHPcvRCYB/y4oRd396fcvcjdizIzM5tQbuS7+Lyu3Pvxfvxh6WaenV8WdDkiEqeaEgDlQG697Rxgc73tjsBg4G0zKwUuAmaZWZG717j7Pe4+3N0nAOnAWmAncAB4Jfw1XgIKz2okUea2S8/j8n6ZPDh7NcvKq4IuR0TiUFMCYCFQYGb5ZpYCTARm1T3o7tXunuHuee6eB8wHxrt7sZmlmll7ADMbC9S4+yoPzXv8Abgs/GWuBE64qRzrEhKMn9wwnMyObbh9RomayovIOddoALh7DXAn8AawGnjR3Vea2QNmNr6Rp2cBJWa2GvgGcHO9x74BfMfMloX3NzSFFNM6t0/hiSmFbNtziK+8uIRaNZERkXPIoukmZFFRkRcXFwddRoubPmcD3/nDKr4xrj+3XXZe0OWISIwxs0XuXnTyfq0EjgC3jM7jU0N68OM/r2HB+p1BlyMicUIBEAHMjB98dgi9uqRy13OLqdyrpvIi0voUABGiY9tknpxSSPXBo9z9/GI1lReRVqcAiCADenTioWsGM/eDnTz21/eDLkdEYpwCIMJcX5TLDUU5PP7WOv62ZnvQ5YhIDFMARKAHJgymf/eO3PPCEiqq1FReRFqHAiACtU1O5Bc3jaTmmHPHjBKO1KipvIi0PAVAhMrPaM/D1w1lyaYqvv/a6qDLEZEYpACIYJ8c0oNbx+QxbU4pf1qupvIi0rIUABHu/qsGMKJXOl//7TI2qKm8iLQgBUCES0lK4OeTC0lKNG57dpGayotIi1EARIHs9HY8euNw3tu6l//6/YqgyxGRGKEAiBKX98vizsvP58Xicl4q3tT4E0REGqEAiCL3jO3LxX268u3fr+C9rXuCLkdEopwCIIokJhg/nTScTm2Tuf3ZEvYeUhMZEWk+BUCUyeoYaipftusA96mpvIicBQVAFLqwT1e++vF+/HHZFp6ZWxp0OSISpRQAUeqLH+3Dlf2z+J8/rWbxxt1BlyMiUUgBEKUSEoxHbhhGVse23DlzMbv3Hwm6JBGJMgqAKJaemsKTUwqp3HtYTeVF5IwpAKLcsNx0vn31AP62ppJf/P2DoMsRkSiiAIgBN13Um08P68kjf17DvA/UVF5EmkYBEAPMjO9fO4S8jPbc9dxitu85FHRJIhIFFAAxokObJH4xZST7Dh/lrucWU3NMTWRE5PQUADGkX/eO/M81Q1iwYRc/+YuayovI6SkAYsxnR+Yw8YJcnnz7A956b1vQ5YhIBFMAxKDvjB/EwB6duOeFpZTvPhB0OSISoRQAMSjUVL6Q2tpQU/nDNWoiIyIfpgCIUb27tudH1w9laXk13/ujmsqLyIcpAGLYuME9+Nwl+Twzr4w/LN0cdDkiEmEUADHuvqv6U9grnft+t4wPKvcFXY6IRBAFQIxLTgw1lU9JSuD2Z0s4eET3A0QkRAEQB3qmt+OxiSN4f/tevvXqCjWRERGgiQFgZuPMbI2ZrTOz+05z3HVm5mZWFN5OMbNpZrbczJaa2WUNPGeWma1o9gikSS7tm8ldVxTwu5JyXlRTeRGhCQFgZonAE8BVwEBgkpkNbOC4jsCXgAX1dn8ewN2HAGOBR8wsod5zrgU0MX2O3H1lAZecn8F//X4lqzarqbxIvGvKFcAoYJ27r3f3I8DzwIQGjnsQeBio/0lkA4E3Adx9O1AF1F0ddAC+AjzU7OrljCQmGI9NHE56ajK3z1jEHjWVF4lrTQmAbKD+nEF5eN9xZjYCyHX32Sc9dykwwcySzCwfGAnkhh97EHgEOO1SVTP7gpkVm1lxZWVlE8qV08no0IafTy5k0+6DfOO3y3Q/QCSONSUArIF9x39qhKd0HgXubeC4pwkFRjHwGDAXqDGz4cD57v5KYy/u7k+5e5G7F2VmZjahXGnMBXld+Pon+vHaiq1Mm1MadDkiEpCkJhxTzr9+awfIAeqvKuoIDAbeNjOA7sAsMxvv7sXAPXUHmtlcYC1wKTDSzErDNWSZ2dvuflnzhyJn4gsf7UNx2W6+96fVDMtNZ2TvzkGXJCLnWFOuABYCBWaWb2YpwERgVt2D7l7t7hnunufuecB8YLy7F5tZqpm1BzCzsUCNu69y91+4e8/w8ZcA7+uH/7llZvz4+mH0SG/LnTNL2KWm8iJxp9EAcPca4E7gDWA18KK7rzSzB8xsfCNPzwJKzGw18A3g5rMtWFpOWrtknpw8kp37jvDlF9RUXiTeWDTdBCwqKvLi4uKgy4g5z84v41uvruDesX2568qCoMsRkRZmZovcvejk/VoJLEy5sBcThvfk0b++z9x1O4IuR0TOEQWAYGZ87zND6JPZgS89v5htaiovEhcUAAJA+zZJ/GJKIfsPH+OumWoqLxIPFAByXEG3jnz/2iG8W7qLH/15TdDliEgrUwDICa4Zkc3kC3vx//6+nr+sUlN5kVimAJAP+a+rBzI4uxP3vriETbvUVF4kVikA5EPaJify5OSROHDHTDWVF4lVCgBpUK+uqTxy/TCWlVfz0Gw1lReJRQoAOaWPD+rOFz7ah9/ML+P3SyqCLkdEWpgCQE7ra5/oxwV5nbn/5eWs27436HJEpAUpAOS0khMTeHxSIe2SE7nt2RIOHKkJuiQRaSEKAGlU97S2/HTiCNZV7uNbr6ipvEisUABIk1xSkMHdVxbw8uIKnl+opvIisUABIE121xUFfKQgg/+etZIVFdVBlyMiZ0kBIE2WmGA8duNwuqSmcPuMEqoPqqm8SDRTAMgZ6dqhDU9MGcHmqoN87aWluh8gEsUUAHLGRvbuwn1X9efPq7bxq39uCLocEWkmBYA0y+cuyecTg7rxg9feY1HZrqDLEZFmUABIs5gZD183jJ7p7bhjxmJ27jscdEkicoYUANJsae2SeXJKIbsOhJrKH1NTeZGoogCQszI4O43vjh/EP9bu4PG31gZdjoicAQWAnLWJF+Ry7YhsfvrmWv6xtjLockSkiRQActbMjIc+M5iCrA58+fklbK1WU3mRaKAAkBaRmpLEk1MKOXj0GHfOLOGomsqLRDwFgLSY87NCTeWLy3bzozfUVF4k0ikApEVNGJ7NzRf15ql31vPGyq1BlyMip6EAkBb3rasHMDQnja++tJSNO9VUXiRSKQCkxbVJSuSJyYUYcNuMRRw6qqbyIpFIASCtIrdLKj+5YTgrN+/hgdmrgi5HRBqgAJBW87GB3fjipX2YuWAjry5WU3mRSKMAkFb1tY/3Y1R+F+5/eTlrt6mpvEgkUQBIq0pKTODnk0bQvk0it80oYf9hNZUXiRQKAGl1WZ3a8rOJI1hfuY//fGW5msiIRAgFgJwTo8/P4J6P9eX3SzYzY8HGoMsREZoYAGY2zszWmNk6M7vvNMddZ2ZuZkXh7RQzm2Zmy81sqZldFt6famZ/NLP3zGylmf2gRUYjEe2Oy8/n0r6ZPPCHVSwvV1N5kaA1GgBmlgg8AVwFDAQmmdnABo7rCHwJWFBv9+cB3H0IMBZ4xMzqXvPH7t4fGAGMMbOrzmYgEvkSEoxHbxxO1w4p3D5zEdUH1FReJEhNuQIYBaxz9/XufgR4HpjQwHEPAg8D9T8KciDwJoC7bweqgCJ3P+DufwvvPwKUADnNHoVEjS7tU3hiSiFbqg5xz4tLqD6oEBAJSlMCIBvYVG+7PLzvODMbAeS6++yTnrsUmGBmSWaWD4wEck96bjrwacJBcTIz+4KZFZtZcWWlPms+FhT26sy3rx7IW+9t5+Lvv8m3X13Buu37gi5LJO4kNeEYa2Df8bdxhKd0HgWmNnDc08AAoBgoA+YCNfWemwQ8B/zM3dc39OLu/hTwFEBRUZHePhIjbhmdx8jenZk+t5QXFm7iN/PL+GjfTG4dncelfTNJSGjo205EWpI19pY8M7sY+I67fyK8fT+Au38/vJ0GfADU/QrXHdgFjHf34pO+1lzg/7j7qvD208A+d/9SU4otKiry4uLixg+UqLJj32FmLtjIs/PL2L73MPkZ7bnl4t5cV5RLhzZN+R1FRE7HzBa5e9GH9jchAJKA94ErgQpgITDZ3Vee4vi3ga+6e7GZpYZfY7+ZjQW+7e4fDR/3EKGrg+vdvUndQxQAse1ITS2vrdjCtDmlLNlURYc2SVxflMMtF+eRl9E+6PJEotapAqDRX6/cvcbM7gTeABKBp919pZk9ABS7+6zTPD0LeMPMagmFx83hYnKAbwLvASVmBvBzd//lGY5LYkhKUgIThmczYXg2izfuZvrcUn4zr4zpc0u5ol8WU8fkccn5GYS/X0TkLDV6BRBJdAUQf7btOcSM+WXMWLCRnfuPUJDVgVtG53FtYTapKZoeEmmKZk8BRRIFQPw6dPQYs5dtYdqcDazcvIdObZOYNKoXN1/cm5zOqUGXJxLRFAASE9yd4rLdTJ9Tyusrt+LufHxgd6aOyePC/C6aHhJpQLPvAYhEEjPjgrwuXJDXhYqqgzw7v4zn3t3I6yu3MqBHJ24dncf44T1pm5wYdKkiEU9XABL1Dh45xu+XVDBtTilrtu2lS/sUJo3K5eaL8uie1jbo8kQCpykgiXnuzrz1O5k2p5S/rt5GohnjBnfn1jH5FPZK1/SQxC1NAUnMMzNGn5fB6PMy2LjzAL+eV8oLxZuYvWwLQ3PSuHVMHp8c0oM2SZoeEgFdAUiM23+4hpdLypk2t5T1lfvJ6NCGmy7qxeQLe5HVUdNDEh80BSRxrbbW+ce6HUyfs4G/rakkOdH49NCeTB2Tx9Cc9KDLE2lVmgKSuJaQYFzaN5NL+2ayvnIfv55XxkvFm3h5cQUje3dm6ug8xg3uTnKimuRJ/NAVgMStPYeO8tvicp6ZV0rZzgN079SWmy/uzcQLcunaoU3Q5Ym0GE0BiZzCsVrn7TXbmTanlH+u20FKUgLXDO/J1NH5DOzZKejyRM6apoBETiExwbhyQDeuHNCNtdv2Mm1uKS+XlPNicTkX5nfh1jH5jB3YjUT1KJAYoysAkQZUHTjCCws38et5ZVRUHSQ7vR23jO7NjUW9SEtNDro8kTOiKSCRZqg5VstfV29j2pxSFmzYRbvkRK4tzGbq6DwKunUMujyRJlEAiJylVZv3MH3uBl5dspkjNbV8pCCDqaPzuLxfllpYSkRTAIi0kF37j/Dcuxv5zbwytu45RO+uqdxycR7XF+XQsa2mhyTyKABEWtjRY7W8vmIr0+eWsqhsN+1TErm+KJdbRueRrxaWEkEUACKtaOmmKqbPLWX2ss0cPeZc3i+TW8fk85ECtbCU4CkARM6B7XsPMWP+RmYs2MiOfYc5L7M9U8fkc+2IbNq30buuJRgKAJFz6HDNMf64bAvT5pSyvKKajm2TmHhBLv92cR65XdTCUs4tBYBIANydko27mTanlNdWhFpYfmxAN6aOyePiPl01PSTnhFYCiwTAzBjZuwsje3dhS3WoheXMBRv586pt9O/ekamj87hmRLZaWEogdAUgco4dOnqMWUs28/ScDby3dS/pqclMGtWLmy/qTc/0dkGXJzFIU0AiEcbdWbBhF9PnlPLnVVsxM8YN6s6tY/IY2buzpoekxWgKSCTCmBkX9enKRX26smnXAZ6dX8Zz727kj8u3MDi7E7eOzufqYWphKa1HVwAiEeTAkRpeWVzB9DmlrN2+j4wOKUy+sDc3XdiLrE5qYSnNoykgkSji7sxZt5Npczbw1prtJCUYnxrSg6lj8hmeqxaWcmY0BSQSRcyMSwoyuKQgg9Id+3lmXikvFZfz6pLNjOiVztTReXxySA+1sJSzoisAkSix73ANvy3exDPzytiwYz/dOrXhpgt7M/nCXmphKaelKSCRGFFb6/z9/UqmzS3lnfcrSUlKYPywntw6Jo9BPdOCLk8ikKaARGJEQoJxef8sLu+fxbrte3lmbhm/Kynnt4vKGZXXhVvH5DF2YDeSND0kjdAVgEgMqD54lJeKNzF9binlu0MtLG++uDcTL8glPTUl6PIkYJoCEokDx2qdN8MtLOet30nb5AQ+MyKHqaPz6NddLSzjlQJAJM68t3UP0+eU8sriCg7X1DLm/K5MHZ3PFf2zSFQLy7hyqgBo0iShmY0zszVmts7M7jvNcdeZmZtZUXg7xcymmdlyM1tqZpfVO3ZkeP86M/uZad27SIvq370TP/jsUObffyVfH9eP9ZX7+fyvi7n8x2/zy3+sZ8+ho0GXKAFr9ArAzBKB94GxQDmwEJjk7qtOOq4j8EcgBbjT3YvN7A6gyN1vNbMs4DXgAnevNbN3gbuB+cCfgJ+5+2unq0VXACLNV3OsljdWbmP63A0sLN1Nakoi143M4ZbReZyX2SHo8qQVnc0VwChgnbuvd/cjwPPAhAaOexB4GDhUb99A4E0Ad98OVAFFZtYD6OTu8zyUQL8GrjmTAYnImUlKTOBTQ3vw0n+MZvZdl3DV4B48/+4mrnzk79zy9Lu8vWY7tbXRMyUsZ68pAZANbKq3XR7ed5yZjQBy3X32Sc9dCkwwsyQzywdGArnh55ef7mvW+9pfMLNiMyuurKxsQrki0pjB2Wk8csMw5t5/BV8Z25dVW/YwddpCPvaTv/Pw6+/x+oqtbKk+SDTdI5Qz15R1AA3NzR//rjCzBOBRYGoDxz0NDACKgTJgLlDT2Nc8Yaf7U8BTEJoCakK9ItJEGR3a8KUrC/iPS8/jtRVb+M28Mp56Zz014SuBjA5tGJaTxpCcNIblpDMkJ40MrTqOGU0JgHJCv7XXyQE219vuCAwG3g7fx+0OzDKz8e5eDNxTd6CZzQXWArvDX+dUX1NEzqGUpAQmDM9mwvBsDh09xqote1heXs3S8iqWl1fz1prt1F0MZKe3Y0h2GkNz0xiaHQqFtHbJwQ5AmqUpAbAQKAhP4VQAE4HJdQ+6ezWQUbdtZm8DXw3fBE4ldKN5v5mNBWrqbh6b2V4zuwhYAPwb8HgLjUlEzkLb5EQKe3WmsFfn4/v2Ha5hZUU1y8qrWVZRzbLyKl5fufX443ldUxmak87QnDSG5qQzqGcn2rfRBw1EukbPkLvXmNmdwBtAIvC0u680sweAYnefdZqnZwFvmFktofC4ud5jtwHTgXaE3h102ncAiUhwOrRJ4sI+XbmwT9fj+6oOHGF5XSiUV1FcuotZS0MX8gkG52d1YEh2OsNy0xiSncaAHp3U+zjCaCGYiLSYyr2HWV5RxdJN1eFwqGLHviMAJCUY/bp3rHelkEbfbh31kdbngFYCi8g55+5sqT7EsvKq8JVCKBT2HKoBoE1SAgN7dmJodtrxYOiT2UErlVuYAkBEIoK7U7bzQOhewqYqllVUs6KimgNHjgHQPiWRQdlp4XcfpTMsJ41eXVLRhwU0nz4OWkQigpmRl9GevIz2jB/WEwh9iN36yn0sLa9meXkVS8ureWZeGUdqNgCQ1i6ZoTmhewl1Vwo90toqFM6SrgBEJCIdPVbLmq17j99LWFZezZqte09Yo1B3L6Hu3Udao9AwXQGISFRJTkxgcHYag7PTmDSqFwCHjh5j9ZY9J9xP+Fu9NQo909oyNLxgbVhOOkOy00hL1RqFU1EAiEjUaJucyIhenRlRb43C/sM1rKgIveuobgrp5DUKdfcShoQDRWsUQvR/QUSiWvsG1ihUHzgaDoTQSuZFpbv4Q3iNghmcn9nhhLejxusaBQWAiMSctNRkLinI4JKC4x9ScHyNQt300d/f387vSkKfSfmvNQqhewlDstPo1z321yjoJrCIxKV/rVEI3UuoW9VcfTDUKCclKYGBPTqd8HbUaF2joHUAIiKNcHc27jpwwttRV1ZUs/+kNQpDs9MYmpvO0Ow0eneN/DUKCgARkWaoW6NQd6WwrKKalZv3cKSmFoBObZNOuJ8wNCc94tYoKABERFrI0WO1vL9t7wlvRz1xjULK8XsJoQ/DSyezY3BrFLQOQESkhSQnJjCoZxqDeqYxaVRoX90aheUV1eEPw/vwGoUhOf9ayTw0Oz3wNQoKABGRFnDCGoWLQ/v2H65h5eY99T4Mr4o3Vm47/pzedX0UskPTR4Oy0+hwDtcoKABERFpJ+zZJjMrvwqj8Lsf31a1RWFZRxbJN1ZSU7f7QGoX6LTgHtuIaBQ6SV5EAAAR6SURBVAWAiMg51NAahR37Dp/QgvOd93fwckkFEFqj0LdbR2Z+/kLSU1NatBYFgIhIwDI6tOHy/llc3j8LCL0ddeueQ8fvJazbvq9V+i4rAEREIoyZ0SOtHT3S2jFucPdWe53YXucsIiKnpAAQEYlTCgARkTilABARiVMKABGROKUAEBGJUwoAEZE4pQAQEYlTUfVx0GZWCZQ18+kZwI4WLCdIsTKWWBkHaCyRKlbGcrbj6O3umSfvjKoAOBtmVtzQ52FHo1gZS6yMAzSWSBUrY2mtcWgKSEQkTikARETiVDwFwFNBF9CCYmUssTIO0FgiVayMpVXGETf3AERE5ETxdAUgIiL1KABEROJUTAaAmT1tZtvNbEW9fV3M7C9mtjb8385B1tgUpxjHd8yswsyWhP98Msgam8rMcs3sb2a22sxWmtnd4f1RdV5OM46oOy9m1tbM3jWzpeGxfDe8P9/MFoTPyQtm1rJ9CFvBacYy3cw21Dsvw4OutSnMLNHMFpvZ7PB2q5yTmAwAYDow7qR99wFvunsB8GZ4O9JN58PjAHjU3YeH//zpHNfUXDXAve4+ALgIuMPMBhJ95+VU44DoOy+HgSvcfRgwHBhnZhcBPyQ0lgJgN/C5AGtsqlONBeBr9c7LkuBKPCN3A6vrbbfKOYnJAHD3d4BdJ+2eADwT/vszwDXntKhmOMU4opK7b3H3kvDf9xL65s4mys7LacYRdTxkX3gzOfzHgSuA34b3R/w5gdOOJeqYWQ7wKeCX4W2jlc5JTAbAKXRz9y0Q+kcMZAVcz9m408yWhaeIInrKpCFmlgeMABYQxeflpHFAFJ6X8FTDEmA78BfgA6DK3WvCh5QTJQF38ljcve68/E/4vDxqZm0CLLGpHgO+DtSGt7vSSuckngIgVvwCOI/QZe4W4JFgyzkzZtYB+B3wZXffE3Q9zdXAOKLyvLj7MXcfDuQAo4ABDR12bqtqnpPHYmaDgfuB/sAFQBfgGwGW2CgzuxrY7u6L6u9u4NAWOSfxFADbzKwHQPi/2wOup1ncfVv4G70W+F9C/2ijgpklE/qhOcPdXw7vjrrz0tA4ovm8ALh7FfA2ofsa6WaWFH4oB9gcVF3NUW8s48JTdu7uh4FpRP55GQOMN7NS4HlCUz+P0UrnJJ4CYBZwS/jvtwC/D7CWZqv7YRn2GWDFqY6NJOF5zF8Bq939J/UeiqrzcqpxRON5MbNMM0sP/70d8DFC9zT+BlwXPizizwmccizv1fvlwgjNm0f0eXH3+909x93zgInAW+4+hVY6JzG5EtjMngMuI/QRqtuA/wZeBV4EegEbgevdPaJvsJ5iHJcRmmZwoBT4Yt0ceiQzs0uAfwDL+dfc5n8Smj+PmvNymnFMIsrOi5kNJXRDMZHQL4MvuvsDZtaH0G+fXYDFwE3h36Aj1mnG8haQSWgaZQnwH/VuFkc0M7sM+Kq7X91a5yQmA0BERBoXT1NAIiJSjwJARCROKQBEROKUAkBEJE4pAERE4pQCQEQkTikARETi1P8HcdYIpuysdvQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot CV误差曲线\n",
    "test_means = grid_search.cv_results_[ 'mean_test_score' ]\n",
    "test_stds = grid_search.cv_results_[ 'std_test_score' ]\n",
    "train_means = grid_search.cv_results_[ 'mean_train_score' ]\n",
    "train_stds = grid_search.cv_results_[ 'std_train_score' ]\n",
    "\n",
    "x_axis = min_child_samples_s\n",
    "\n",
    "plt.plot(x_axis, -test_means)\n",
    "#plt.errorbar(x_axis, -test_scores, yerr=test_stds ,label = ' Test')\n",
    "#plt.errorbar(x_axis, -train_scores, yerr=train_stds,label =  +' Train')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([-0.49821541, -0.49199925, -0.48995372, -0.4889818 ])"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_means"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 再往下细调"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Fitting 3 folds for each of 5 candidates, totalling 15 fits\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[Parallel(n_jobs=4)]: Using backend LokyBackend with 4 concurrent workers.\n",
      "[Parallel(n_jobs=4)]: Done  12 out of  15 | elapsed: 37.9min remaining:  9.5min\n",
      "[Parallel(n_jobs=4)]: Done  15 out of  15 | elapsed: 43.1min finished\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "GridSearchCV(cv=StratifiedKFold(n_splits=3, random_state=3, shuffle=True),\n",
       "             error_score='raise-deprecating',\n",
       "             estimator=LGBMClassifier(boosting_type='goss', class_weight=None,\n",
       "                                      colsample_bytree=0.7,\n",
       "                                      importance_type='split',\n",
       "                                      learning_rate=0.1, max_bin=127,\n",
       "                                      max_depth=7, min_child_samples=20,\n",
       "                                      min_child_weight=0.001,\n",
       "                                      min_split_gain=0.0, n_estimators=368,\n",
       "                                      n_jobs=4, num_class=9, num_leaves=70,\n",
       "                                      objective='multiclass', random_state=None,\n",
       "                                      reg_alpha=0.0, reg_lambda=0.0,\n",
       "                                      silent=False, subsample=1.0,\n",
       "                                      subsample_for_bin=200000,\n",
       "                                      subsample_freq=0),\n",
       "             iid='warn', n_jobs=4,\n",
       "             param_grid={'min_child_samples': range(1, 10, 2)},\n",
       "             pre_dispatch='2*n_jobs', refit=False, return_train_score='warn',\n",
       "             scoring='neg_log_loss', verbose=5)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.1,\n",
    "          'n_estimators':n_estimators_1,\n",
    "          'max_depth': 7,\n",
    "          'num_leaves':70,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          'colsample_bytree': 0.7,\n",
    "         }\n",
    "lg = LGBMClassifier(silent=False,  **params)\n",
    "\n",
    "min_child_samples_s = range(1,10,2) \n",
    "tuned_parameters = dict( min_child_samples = min_child_samples_s)\n",
    "\n",
    "grid_search = GridSearchCV(lg, n_jobs=4,  param_grid=tuned_parameters, cv = kfold, scoring=\"neg_log_loss\", verbose=5, refit = False,return_train_score='warn')\n",
    "grid_search.fit(X_train , y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.4995051128120802\n",
      "{'min_child_samples': 9}\n"
     ]
    }
   ],
   "source": [
    "# examine the best model\n",
    "print(-grid_search.best_score_)\n",
    "print(grid_search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAWoAAAD4CAYAAADFAawfAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAat0lEQVR4nO3deXxU9b3/8dcnC4awL1GBIFF2NxZTZYmKqAhCUen1Xu1Vq7UXrQhWrQrtXX7t7a1aq7cXcLlcXKvFuuGGoj4U1IBiwypKEFCUtQSRsC+Bz++PmSjEBCZhJufMzPv5eMyDSc7MmXfyiG9PvpnPOebuiIhIeGUEHUBERA5NRS0iEnIqahGRkFNRi4iEnIpaRCTkshKx09atW3tBQUEidi0ikpLmzp270d3zqtuWkKIuKCigpKQkEbsWEUlJZvZlTdu09CEiEnIqahGRkFNRi4iEnIpaRCTkVNQiIiGnohYRCTkVtYhIyIWmqHdX7GPSeyv428pNQUcREQmV0BS1Ozw6ayW/m7YEnSNbROQ7hy1qM+tqZgsOuG0xs1/EO0hOdia3nN+Fhas289rH6+O9exGRpHXYonb3pe7e0917AqcBO4CpiQgzonc+3Y5twh/eKGVPxf5EvISISNKp7dLHucAKd69xJv1IZGYYdwzpxpdf7+AvcxLyEiIiSae2RX0ZMKW6DWY20sxKzKykrKyszoEGdMmjX8dWjH9nOVt37a3zfkREUkXMRW1mDYDhwLPVbXf3Se5e6O6FeXnVnqkv1tdh3JDubNq+h/999/M670dEJFXU5oh6CDDP3f+eqDCVTslvxvAebZlc/Dnry3cl+uVEREKtNkV9OTUseyTCbRd0Zd9+57/f+qy+XlJEJJRiKmozywXOB15IbJzvtG+Zy1V9C3h27io++/vW+npZEZHQiamo3X2Hu7dy9/JEBzrQjed0otFRWdz9eml9vqyISKiEZjKxOi0aNeCGAZ14u3QDH37+ddBxREQCEeqiBrimfwFtmuVw52saLReR9BT6ov52tHx1OdM+Xhd0HBGRehf6ooYDRsunL9VouYiknaQo6swMY+yQbny1aQdPabRcRNJMUhQ1wNld8ujfqRUT3lnOFo2Wi0gaSZqiNjPGDq4cLV8RdBwRkXqTNEUNkdHyi3q25eHiLzRaLiJpI6mKGuCXg7qyfz8aLReRtJF0RR0ZLe/As3NXsXS9RstFJPUlXVEDjKocLZ+u0XIRSX1JWdQtGjVg1DmdeKd0Ax+s0Gi5iKS2pCxqgKv7FdC2WQ53vr6E/fs1Wi4iqStpizonO5NbBnVlkUbLRSTFJW1RA1zSqx3djm3CPW9otFxEUldSF3VmhjHuwu4aLReRlJbURQ1wVufWFHVqzfi3l2m0XERSUtIXtVnkhE3f7NjLQzM1Wi4iqSfpixrg5HbNuDg6Wr6ufGfQcURE4iolihrg1kFdcddouYiknpQp6srR8ufmrqZ0/Zag44iIxE1MRW1mzc3sOTMrNbMlZtY30cHq4saBnWisq5aLSIqJ9Yj6f4Dp7t4N6AEsSVykumueGxktn7G0jNkrNgYdR0QkLg5b1GbWFDgLeBjA3fe4++ZEB6urn/QroF3zhtz1eqlGy0UkJcRyRH0CUAY8ambzzWyymTWq+iAzG2lmJWZWUlZWFvegsaq8avmi1eW8qtFyEUkBsRR1FtAbeNDdewHbgbFVH+Tuk9y90N0L8/Ly4hyzdi7u1Y7ubZpyzxul7K7YF2gWEZEjFUtRrwZWu/uc6MfPESnu0MrMMMYN6caqTTt56sOvgo4jInJEDlvU7r4eWGVmXaOfOhf4NKGp4uCsLnkUdWrNhHeWUb5To+UikrxifdfHaOApM1sE9AR+n7hI8fPtaLmuWi4iSSymonb3BdH151Pd/WJ3/ybRweLh5HbNuKRXOx7RaLmIJLGUmUysya2DuuAO972p0XIRSU4pX9T5LXL5Sb8OPDdPo+UikpxSvqghctXyJhotF5EklRZF3Ty3ATcOjI6WL9douYgkl7QoaoCr+kZGy+/UaLmIJJm0Keqc7ExuHdSFj9eU88qitUHHERGJWdoUNcDFPSOj5X98c6lGy0UkaaRVUWdkGL+6MDJa/qRGy0UkSaRVUQOc2TmPMztrtFxEkkfaFTXAHYO7Ub5To+UikhzSsqhPbteMS3pGRsvXbtZouYiEW1oWNcAtg7rgwH26armIhFzaFnV+i1yu7lfA8/NWs2SdRstFJLzStqgBRg3oRNOcbO6ertFyEQmvtC7qZrnZ3HhOJ2YuLWOWRstFJKTSuqgBruzbITpavkSj5SISSmlf1DnZmfzygi4sXrNFo+UiEkppX9QAF/Vox4ltmnLPGxotF5HwUVFTOVrendXf7OTPH3wZdBwRkYOoqKOKOrfmzM6tmThjuUbLRSRUVNQHGDskMlr+4EyNlotIeMRU1Ga20sw+NrMFZlaS6FBBOalt9Krls75gjUbLRSQkanNEfY6793T3woSlCYFbB3UFdNVyEQkPLX1U0a55Q67pV8AL8zVaLiLhEGtRO/Cmmc01s5HVPcDMRppZiZmVlJWVxS9hAG6IjpbfpauWi0gIxFrU/d29NzAEGGVmZ1V9gLtPcvdCdy/My8uLa8j61iw3m9EDO/HuZ2UUL9NouYgEK6aidve10X83AFOB0xMZKgw0Wi4iYXHYojazRmbWpPI+MAhYnOhgQTsqK5PbLujKJ2s1Wi4iwYrliPoYoNjMFgIfAdPcfXpiY4XD8B5tOaltU/4wXaPlIhKcwxa1u3/u7j2it5Pc/b/qI1gYZGQY44Z0Z81mjZaLSHD09rzDKOrcmrO65DHhneWU79BouYjUPxV1DMYO7saWXXt54N3lQUcRkTSkoo7BiW2bMqJXPo/OWqnRchGpdyrqGN0yqAsA9765NOAkIpJuVNQxate8Idf0L2Dq/DV8ulaj5SJSf1TUtXDDgE40a5jNXbpquYjUIxV1LTRrGLlq+XsaLReReqSirqUr+3Ygv4VGy0Wk/qioa+nA0fKXF2q0XEQST0VdBz88tS0nt4tctXzXXo2Wi0hiqajrQKPlIlKfVNR11L9Ta87ukhe5arlGy0UkgVTUR2DskOho+UyNlotI4qioj0D3Nk35Ue98Hp2t0XIRSRwV9RG65fwuGBotF5HEUVEfobbNG3JN/+OZOn8Nn6wtDzqOiKQgFXUc/HxAx8houa5aLiIJoKKOg8rR8veXbeT9ZWVBxxGRFKOijpNvR8tfK9VouYjElYo6TipHyz9dt4WXFq4JOo6IpBAVdRz98NS2nNKuGX984zONlotI3MRc1GaWaWbzzezVRAZKZpHR8m4aLReRuKrNEfVNwJJEBUkV/Tq1ZkDXPCa8s4zNO/YEHUdEUkBMRW1m+cBQYHJi46SGsUO6sXV3BQ/MXBF0FBFJAbEeUf8JuB3YX9MDzGykmZWYWUlZWXq/Ra3bsZHR8sdmrWT1NzuCjiMiSe6wRW1mw4AN7j73UI9z90nuXujuhXl5eXELmKxuOb8LZnDfm58FHUVEklwsR9T9geFmthJ4GhhoZk8mNFUKaNu8IT8tOp6pC9aweI1Gy0Wk7g5b1O4+zt3z3b0AuAx4x92vSHiyFPDzAR1p3jCbu3XVchE5AnofdQI1zcnmxoGdeX/ZRt77LL3X7UWk7mpV1O4+092HJSpMKrqiz3G0b9mQO1/XaLmI1I2OqBMsMlrejSXrtvDiAo2Wi0jtqajrwbBT2nBKu2bc+6ZGy0Wk9lTU9SAjwxh3YWS0/IkPVgYdR0SSjIq6nvTr2JpzuuYx8Z3lGi0XkVpRUdejsUO6s213BffP0FXLRSR2Kup61PXYJvyodz6Pz/6SVZs0Wi4isVFR17NbBkVHy9/SaLmIxEZFXc/aNGvItUWRq5ZrtFxEYqGiDsD1AzrSIlej5SISGxV1AJrmZDNao+UiEiMVdUCu6NOB41rmcufrpezTaLmIHIKKOiANsjK47YKukdHy+RotF5GaqagDNPSUNpya34x731yq0XIRqZGKOkAZGcbYId1YW76Lx2evDDqOiISUijpg/Tq2ZmC3o7l/hkbLRaR6KuoQuGNwN42Wi0iNVNQh0PXYJvzDaRotF5HqqahD4ubzu5CRAfe+uTToKCISMirqkKgcLX9xwVqNlovIQVTUIXLd2ZHR8jtfX4K7hmBEJOKwRW1mOWb2kZktNLNPzOw39REsHTXNyWbMuZ2Ztfxr3lu2Meg4IhISsRxR7wYGunsPoCcw2Mz6JDZW+vrnM6Kj5a8t0Wi5iAAxFLVHbIt+mB29qUESpEFWBrcP7krp+q0aLRcRIMY1ajPLNLMFwAbgLXefU81jRppZiZmVlJXpjHBHYugpbeih0XIRiYqpqN19n7v3BPKB083s5GoeM8ndC929MC8vL94504qZMXZId9aW7+IxjZaLpL1avevD3TcDM4HBCUkj3+rbsRXnRkfLv9mu0XKRdBbLuz7yzKx59H5D4DxAlyapB3cM6cZ2jZaLpL1YjqjbADPMbBHwNyJr1K8mNpYAdDmmCZee1p4nPtBouUg6i+VdH4vcvZe7n+ruJ7v7b+sjmERUjpb/UaPlImlLk4khd2yzHH5WdAIvabRcJG2pqJPAdWefQMtGDfj9axotF0lHKuok0CQnmzEDOzF7xde8q6uWi6QdFXWS+PEZHejQKpe7dNVykbSjok4SlVctL12/lakaLRdJKyrqJKLRcpH0pKJOImbGuAu7s658F4/OWhl0HBGpJyrqJNPnhFac1/1oHpip0XKRdKGiTkJ3DI6Mlk/UaLlIWlBRJ6HOxzThHwvb88QHKzVaLpIGVNRJ6ubzu5CZYdzw1DxWf6OyFkllKuokdUzTHCZe3puVG7czdHwxM0o3BB1JRBJERZ3EzjvxGF4ZXUS75g255rG/8YfppVTs2x90LBGJMxV1kito3YgXbujH5ae354GZK7ji4Tls2Lor6FgiEkcq6hSQk53JnSNO5d5Le7Bg1WaGji/mw8+/DjqWiMSJijqF/Oi0fF4aVUSTnCx+/H8fcv+M5ezXeUFEkp6KOsV0PbYJL99YxIWntOGeN5bysydK2LxDgzEiyUxFnYIaH5XFhMt78duLTuL9ZWUMHV/MwlWbg44lInWkok5RZsZVfQt49vp+APzDQ7N54oOVuvCASBJSUae4nu2bM21MEWd2zuPfX/qEMU8vYNvuiqBjiUgtqKjTQPPcBky+qpDbB3dl2qK1DJ9YzNL1W4OOJSIxOmxRm1l7M5thZkvM7BMzu6k+gkl8ZWQYNwzoxFM/68OWnRVcdH8xz89dHXQsEYlBLEfUFcCt7t4d6AOMMrMTExtLEqVvx1a8dlMRPds359ZnFzLuhUW6CIFIyB22qN19nbvPi97fCiwB2iU6mCTO0U1yePLaMxh1TkemfLSKEQ/M5suvtwcdS0RqUKs1ajMrAHoBc6rZNtLMSsyspKxMV8oOu6zMDG67oBuPXF3Ims07GTa+mOmL1wcdS0SqEXNRm1lj4HngF+6+pep2d5/k7oXuXpiXlxfPjJJAA7sdw7QxRZyQ14jrn5zL7179lL06sZNIqMRU1GaWTaSkn3L3FxIbSepbfotcnrm+Lz/p24HJxV9w2aQPWVe+M+hYIhIVy7s+DHgYWOLu9yU+kgThqKxMfnPRyUy4vBel67YwdHwx7y/TEpZIGMRyRN0fuBIYaGYLorcLE5xLAvLDHm15eXQReY2P4qpHPuK/3/qMfTqxk0igsg73AHcvBqweskhIdMxrzIuj+vPrFz/mf95exryvvuFP/9STVo2PCjqaSFrSZKJUq2GDTO69tAd3jTiFOV9sYuj4YkpWbgo6lkhaUlFLjcyMy04/jqk39OOo7Awum/Qhk9//XCd2EqlnKmo5rJPaNuOV0UWc2/1ofjdtCdc/OZfynXuDjiWSNlTUEpOmOdk8dMVp/OvQ7ry9ZAM/nFDM4jXlQccSSQsqaomZmfGzM0/gr9f1YU/FfkY8OJspH32lpRCRBFNRS62d1qEl08YUccbxLRn3wsfc+sxCduzROa5FEkVFLXXSqvFRPHbN6dx8XhemLljDxffPYvmGbUHHEklJKmqps8wM46bzOvPET09n47Y9DJ9YzMsL1wYdSyTlqKjliJ3ZOY/XxpzJiW2aMmbKfP7txcXsrtA5rkXiRUUtcXFssxymjOzDv5x5PH/+8EsufegDVm3aEXQskZSgopa4yc7M4NdDT+R/rzyNLzZuZ9iEYt5e8vegY4kkPRW1xN0FJx3Lq6OLyG/RkGsfL+Hu6aVU6BzXInWmopaE6NCqEc//vB+Xn34cD85cwT9PnsOGLbuCjiWSlFTUkjA52ZncOeIU7vvHHixaXc6F44uZvWJj0LFEko6KWhJuRO98XrqxP00bZnHF5DncP2M5+3WOa5GYqailXnQ5pgkv31jE0FPbcs8bS7n28b/xzfY9QccSSQoqaqk3jY/KYvxlPfnPi06iePlGhk0oZsGqzUHHEgk9FbXUKzPjyr4FPHd9PwAufWg2j89eqRM7iRyCiloC0aN9c6aNKeKsznn8x8ufcOOU+WzbrRM7iVRHRS2BaZ7bgP+7qpDbB3fl9Y/XMXxCMaXrtwQdSyR0VNQSqIwM44YBnfjLv/Rh6+4KLr5/Fs/NXR10LJFQOWxRm9kjZrbBzBbXRyBJT31OaMW0MUX0bN+cXz67kDueW8SuvTqxkwjEdkT9GDA4wTlEOLpJDk9eewajzunIX0tWcckDs/li4/agY4kE7rBF7e7vAZvqIYsIWZkZ3HZBNx69+gesK9/J8AnFTF+8LuhYIoGK2xq1mY00sxIzKykrK4vXbiVNndPtaF4dXcQJRzfm+ifn8dtXPmVPhU7sJOkpbkXt7pPcvdDdC/Py8uK1W0lj+S1yefa6vlzdr4BHZn3BZZM+YO3mnUHHEql3eteHhFqDrAz+3/CTmPjjXixdv5Wh49/n3c/0G5ukFxW1JIVhp7bl5dFFHN0kh6sf/Yj73vqMfTqxk6SJWN6eNwX4AOhqZqvN7NrExxL5vo55jXlxVH9G9Mpn/NvL+MkjH7Fx2+6gY4kknCXiHAuFhYVeUlIS9/2KALg7z5Ss4t9f+oTmudlM/HFvflDQMuhYIkfEzOa6e2F127T0IUnHzPinHxzHCzf0Iyc7k8smfcik91boxE6SslTUkrROatuMV0YXcX73Y/j9a6WM/PNcynfuDTqWSNypqCWpNc3J5sErevNvw05kRukGhk14n8VryoOOJRJXKmpJembGtUXH89fr+lCxzxnx4GyemvOllkIkZaioJWWc1qEl08acyRnHt+TXUxdzyzML2bFH57iW5KeilpTSslEDHrvmdG4+rwsvLljDRRNnsXzD1qBjiRwRFbWknMwM46bzOvPnn57Bpu17GD5xFi8tWBN0LJE6U1FLyirq3JppY87kxDZNuenpBfzrix+zbXcFuyv2fXvbU7H/oNvefQffKg647dvvB932H3BzP/gmEk8aeJGUt3fffu55YymT3vs86CgAmFX5+KBtVuO2qs+1qlsPud+q2+wQ26q+pn3/8/b9uwdmr7xb09dmVR5X9dHVP//7+a2aHFVfq9rnf7v/6r8P1X5N1bxQ1ddsmduAZ67v+73XjsWhBl6y6rRHkSSSnZnBry7sztld8liwanONjzvwoKXq8UvVw5kDt3uVrQdvO8QTq2z//mvGvt9D5TnUh1UP1Gr6umvaf3XHeZX7rOlrq3x+TV/Pd5+v/nUq79eU43CZ8YMfd2Dmmp9f8+MO/KBJTmIqVUUtaaN/p9b079Q66BgitaY1ahGRkFNRi4iEnIpaRCTkVNQiIiGnohYRCTkVtYhIyKmoRURCTkUtIhJyCRkhN7My4Ms6Pr01sDGOceJFuWpHuWpHuWonFXN1cPe86jYkpKiPhJmV1DTvHiTlqh3lqh3lqp10y6WlDxGRkFNRi4iEXBiLelLQAWqgXLWjXLWjXLWTVrlCt0YtIiIHC+MRtYiIHEBFLSIScqEpajN7xMw2mNnioLNUMrP2ZjbDzJaY2SdmdlPQmQDMLMfMPjKzhdFcvwk604HMLNPM5pvZq0FnOZCZrTSzj81sgZmF5lpxZtbczJ4zs9Loz1rdruUU30xdo9+nytsWM/tF0LkAzOzm6M/9YjObYmY5QWcCMLObopk+iff3KjRr1GZ2FrANeMLdTw46D4CZtQHauPs8M2sCzAUudvdPA85lQCN332Zm2UAxcJO7fxhkrkpmdgtQCDR192FB56lkZiuBQncP1aCEmT0OvO/uk82sAZDr7jVfM6yemVkmsAY4w93rOsgWryztiPy8n+juO83sGeA1d38s4FwnA08DpwN7gOnAz919WTz2H5ojand/D9gUdI4Dufs6d58Xvb8VWAK0CzYVeMS26IfZ0Vso/o9rZvnAUGBy0FmSgZk1Bc4CHgZw9z1hKumoc4EVQZf0AbKAhmaWBeQCawPOA9Ad+NDdd7h7BfAucEm8dh6aog47MysAegFzgk0SEV1eWABsAN5y91DkAv4E3A7sDzpINRx408zmmtnIoMNEnQCUAY9Gl4smm1mjoENVcRkwJegQAO6+Bvgj8BWwDih39zeDTQXAYuAsM2tlZrnAhUD7eO1cRR0DM2sMPA/8wt23BJ0HwN33uXtPIB84PfqrV6DMbBiwwd3nBp2lBv3dvTcwBBgVXW4LWhbQG3jQ3XsB24GxwUb6TnQpZjjwbNBZAMysBXARcDzQFmhkZlcEmwrcfQlwN/AWkWWPhUBFvPavoj6M6Brw88BT7v5C0Hmqiv6aPBMYHHAUgP7A8Oha8NPAQDN7MthI33H3tdF/NwBTiawnBm01sPqA34ieI1LcYTEEmOfufw86SNR5wBfuXubue4EXgH4BZwLA3R92997ufhaRZdy4rE+DivqQon+0exhY4u73BZ2nkpnlmVnz6P2GRH54S4NNBe4+zt3z3b2AyK/L77h74Ec7AGbWKPoHYaJLC4OI/LoaKHdfD6wys67RT50LBPrH6iouJyTLHlFfAX3MLDf63+e5RP52FDgzOzr673HACOL4fcuK146OlJlNAQYArc1sNfAf7v5wsKnoD1wJfBxdDwb4lbu/FmAmgDbA49G/xmcAz7h7qN4KF0LHAFMj/22TBfzF3acHG+lbo4GnossMnwPXBJwHgOha6/nAdUFnqeTuc8zsOWAekaWF+YRnnPx5M2sF7AVGufs38dpxaN6eJyIi1dPSh4hIyKmoRURCTkUtIhJyKmoRkZBTUYuIhJyKWkQk5FTUIiIh9/8B5zMbl7GJVQQAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# plot CV误差曲线\n",
    "test_means = grid_search.cv_results_[ 'mean_test_score' ]\n",
    "test_stds = grid_search.cv_results_[ 'std_test_score' ]\n",
    "train_means = grid_search.cv_results_[ 'mean_train_score' ]\n",
    "train_stds = grid_search.cv_results_[ 'std_train_score' ]\n",
    "\n",
    "x_axis = min_child_samples_s\n",
    "\n",
    "plt.plot(x_axis, -test_means)\n",
    "#plt.errorbar(x_axis, -test_scores, yerr=test_stds ,label = ' Test')\n",
    "#plt.errorbar(x_axis, -train_scores, yerr=train_stds,label =  +' Train')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### min_child_samples=10"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 列采样参数 sub_feature/feature_fraction/colsample_bytree"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "ename": "SyntaxError",
     "evalue": "invalid syntax (<ipython-input-22-2baeedc2b684>, line 18)",
     "output_type": "error",
     "traceback": [
      "\u001b[1;36m  File \u001b[1;32m\"<ipython-input-22-2baeedc2b684>\"\u001b[1;36m, line \u001b[1;32m18\u001b[0m\n\u001b[1;33m    grid_search = GridSearchCV(lg, n_jobs=4,  param_grid=tuned_parameters, cv = kfold, scoring=\"neg_log_loss\", verbose=5, refit = Fals,return_train_score='warn'e)\u001b[0m\n\u001b[1;37m                                                                                                                                                                ^\u001b[0m\n\u001b[1;31mSyntaxError\u001b[0m\u001b[1;31m:\u001b[0m invalid syntax\n"
     ]
    }
   ],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.1,\n",
    "          'n_estimators':n_estimators_1,\n",
    "          'max_depth': 7,\n",
    "          'num_leaves':70,\n",
    "          'min_child_samples':10,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          #'colsample_bytree': 0.7,\n",
    "         }\n",
    "lg = LGBMClassifier(silent=False,  **params)\n",
    "\n",
    "colsample_bytree_s = [i/10.0 for i in range(5,10)]\n",
    "tuned_parameters = dict( colsample_bytree = colsample_bytree_s)\n",
    "\n",
    "grid_search = GridSearchCV(lg, n_jobs=4,  param_grid=tuned_parameters, cv = kfold, scoring=\"neg_log_loss\", verbose=5, refit = False,return_train_score='warn')\n",
    "grid_search.fit(X_train , y_train)\n",
    "#grid_search.best_estimator_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "0.4958609189767495\n",
      "{'colsample_bytree': 0.5}\n"
     ]
    }
   ],
   "source": [
    "# examine the best model\n",
    "print(-grid_search.best_score_)\n",
    "print(grid_search.best_params_)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "ename": "KeyError",
     "evalue": "'mean_train_score'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mKeyError\u001b[0m                                  Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-21-f3823637d2cb>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m\u001b[0m\n\u001b[0;32m      2\u001b[0m \u001b[0mtest_means\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mgrid_search\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcv_results_\u001b[0m\u001b[1;33m[\u001b[0m \u001b[1;34m'mean_test_score'\u001b[0m \u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      3\u001b[0m \u001b[0mtest_stds\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mgrid_search\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcv_results_\u001b[0m\u001b[1;33m[\u001b[0m \u001b[1;34m'std_test_score'\u001b[0m \u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 4\u001b[1;33m \u001b[0mtrain_means\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mgrid_search\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcv_results_\u001b[0m\u001b[1;33m[\u001b[0m \u001b[1;34m'mean_train_score'\u001b[0m \u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      5\u001b[0m \u001b[0mtrain_stds\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mgrid_search\u001b[0m\u001b[1;33m.\u001b[0m\u001b[0mcv_results_\u001b[0m\u001b[1;33m[\u001b[0m \u001b[1;34m'std_train_score'\u001b[0m \u001b[1;33m]\u001b[0m\u001b[1;33m\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      6\u001b[0m \u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mKeyError\u001b[0m: 'mean_train_score'"
     ]
    }
   ],
   "source": [
    "# plot CV误差曲线\n",
    "test_means = grid_search.cv_results_[ 'mean_test_score' ]\n",
    "test_stds = grid_search.cv_results_[ 'std_test_score' ]\n",
    "train_means = grid_search.cv_results_[ 'mean_train_score' ]\n",
    "train_stds = grid_search.cv_results_[ 'std_train_score' ]\n",
    "\n",
    "x_axis = colsample_bytree_s\n",
    "\n",
    "plt.plot(x_axis, -test_means)\n",
    "#plt.errorbar(x_axis, -test_scores[:,i], yerr=test_stds[:,i] ,label = str(max_depths[i]) +' Test')\n",
    "#plt.errorbar(x_axis, -train_scores[:,i], yerr=train_stds[:,i] ,label = str(max_depths[i]) +' Train')\n",
    "\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "再调小一点，由于特征包括原始特征+tfidf特征，是多了些"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.1,\n",
    "          'n_estimators':n_estimators_1,\n",
    "          'max_depth': 7,\n",
    "          'num_leaves':70,\n",
    "          'min_child_samples':10,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          #'colsample_bytree': 0.7,\n",
    "         }\n",
    "lg = LGBMClassifier(silent=False,  **params)\n",
    "\n",
    "colsample_bytree_s = [i/10.0 for i in range(3,5)]\n",
    "tuned_parameters = dict( colsample_bytree = colsample_bytree_s)\n",
    "\n",
    "grid_search = GridSearchCV(lg, n_jobs=4,  param_grid=tuned_parameters, cv = kfold, scoring=\"neg_log_loss\", verbose=5, refit = False,return_train_score='warn')\n",
    "grid_search.fit(X_train , y_train)\n",
    "#grid_search.best_estimator_"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "# examine the best model\n",
    "print(-grid_search.best_score_)\n",
    "print(grid_search.best_params_)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### colsample_bytree=0.4"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 正则化参数lambda_l1(reg_alpha), lambda_l2(reg_lambda)感觉不用调了"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 减小学习率，调整n_estimators"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.01,\n",
    "          #'n_estimators':n_estimators_1,\n",
    "          'max_depth': 7,\n",
    "          'num_leaves':70,\n",
    "          'min_child_samples':10,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          'colsample_bytree': 0.4,\n",
    "         }\n",
    "n_estimators_2 = get_n_estimators(params , X_train , y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 用所有训练数据，采用最佳参数重新训练模型\n",
    "由于样本数目增多，模型复杂度稍微扩大一点？\n",
    "num_leaves增多5\n",
    "#min_child_samples按样本比例增加到15"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "params = {'boosting_type': 'goss',\n",
    "          'objective': 'multiclass',\n",
    "          'num_class':9, \n",
    "          'n_jobs': 4,\n",
    "          'learning_rate': 0.01,\n",
    "          'n_estimators':n_estimators_2,\n",
    "          'max_depth': 7,\n",
    "          'num_leaves':75,\n",
    "          'min_child_samples':15,\n",
    "          'max_bin': 127, #2^6,原始特征为整数，很少超过100\n",
    "          'colsample_bytree': 0.4,\n",
    "         }\n",
    "\n",
    "lg = LGBMClassifier(silent=False,  **params)\n",
    "lg.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 保存模型，用于后续测试"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import _pickle as cPickle\n",
    "\n",
    "cPickle.dump(lg, open(dpath + \"Otto_LightGBM_goss_org_tfidf.pkl\", 'wb'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 特征重要性"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = pd.DataFrame({\"columns\":list(feat_names), \"importance\":list(lg.feature_importances_.T)})\n",
    "df = df.sort_values(by=['importance'],ascending=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": false
   },
   "outputs": [],
   "source": [
    "plt.bar(range(len(lg.feature_importances_)), lg.feature_importances_)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "tfidf的特征重要性更高一些。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "pycharm": {
   "stem_cell": {
    "cell_type": "raw",
    "metadata": {
     "collapsed": false
    },
    "source": []
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
